Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #93889 > unrolled thread

Mapping, with sequence as key, wildcard and subsequence matching

Started byBen Finney <ben+python@benfinney.id.au>
First post2015-07-16 11:51 +1000
Last post2015-07-16 09:53 +0300
Articles 8 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 11:51 +1000
    Re: Mapping, with sequence as key, wildcard and subsequence matching Steven D'Aprano <steve@pearwood.info> - 2015-07-16 14:01 +1000
      Re: Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 15:53 +1000
      Re: Mapping, with sequence as key, wildcard and subsequence matching Ethan Furman <ethan@stoneleaf.us> - 2015-07-15 23:31 -0700
      Re: Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 16:37 +1000
      Re: Mapping, with sequence as key, wildcard and subsequence matching Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-16 08:09 +0100
      Re: Mapping, with sequence as key, wildcard and subsequence matching Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-16 08:24 +0100
    Re: Mapping, with sequence as key, wildcard and subsequence matching Marko Rauhamaa <marko@pacujo.net> - 2015-07-16 09:53 +0300

#93889 — Mapping, with sequence as key, wildcard and subsequence matching

FromBen Finney <ben+python@benfinney.id.au>
Date2015-07-16 11:51 +1000
SubjectMapping, with sequence as key, wildcard and subsequence matching
Message-ID<mailman.552.1437011478.3674.python-list@python.org>
Howdy all,

What well-defined data type exists with the following properties:

* Mapping, key → value.

* Each key is a sequence (e.g. `tuple`) of items such as text strings.

* Items in a key may be the sentinel `ANY` value, which will match any
  value at that position.

* A key may specify that it will match *only* sequences of the same
  length.

* A key may specify that it will match sequences with arbitrarily many
  additional unspecified items.

If it matters: I'm using Python 2 to implement this.


The “match any item” type is simple enough; I'm aware of the
`unittest.mock.ANY` sentinel, and that meets my needs.

What I'm hoping to find is a general, existing, solution to the problem
of a mapping that uses keys which, in their value, encapsulate the fact
that the key does, or does not, match sequences of arbitrary length
longer than itself.

Example of what I'd like::

    from unittest import mock

    responses = SubsequenceMatchingType([
            ('foo', 'bar', 'baz'):
                "Simple three-argument command.",
            ('wibble', mock.ANY, 'wubble'):
                "Second argument ANY.",
            ('lorem', 'ipsum'):
                "Arbitrary trailing arguments.",
            ])

    responses[('foo', 'bar')]  # raises KeyError
    responses[('foo', 'bar', 'baz')]  # → "Simple three-argument command."
    responses[('foo', 'bar', 'baz', 'bork')]  # raises KeyError

    responses[('wibble', 'wobble')]  # raises KeyError
    responses[('wibble', 'wobble', 'wubble')]  # → "Second argument ANY."
    responses[('wibble', 'wobble', 'baz')]  # raises KeyError
    responses[('wibble', 'wobble', 'wubble', 'baz')]  # raises KeyError

    responses[('lorem',)]  # raises KeyError
    responses[('lorem', 'ipsum')]  # → "Arbitrary trailing arguments."
    responses[('lorem', 'ipsum', 'dolor', 'sit', 'amet')]  # → "Arbitrary trailing arguments."

So some kind of distinguishing feature needs to be added to those keys
which should match arbitrary trailing items: not ‘('foo', 'bar', 'baz')’
but ‘('lorem', 'ipsum')’.

The trouble I'm having is that the keys in the mapping, and the
candidate sequences, are simple tuples. Adding a sentinel item at the
end would make it a different sequence, and indistinguishable from
specifying an additional item at that position.

Is this a problem already solved (and implemented, and debugged!) in a
general form that I can use?

-- 
 \      “I am too firm in my consciousness of the marvelous to be ever |
  `\       fascinated by the mere supernatural …” —Joseph Conrad, _The |
_o__)                                                     Shadow-Line_ |
Ben Finney

[toc] | [next] | [standalone]


#93894

FromSteven D'Aprano <steve@pearwood.info>
Date2015-07-16 14:01 +1000
Message-ID<55a72c8b$0$1671$c3e8da3$5496439d@news.astraweb.com>
In reply to#93889
On Thu, 16 Jul 2015 11:51 am, Ben Finney wrote:

> Howdy all,
> 
> What well-defined data type exists with the following properties:
> 
> * Mapping, key → value.
> 
> * Each key is a sequence (e.g. `tuple`) of items such as text strings.
> 
> * Items in a key may be the sentinel `ANY` value, which will match any
>   value at that position.
> 
> * A key may specify that it will match *only* sequences of the same
>   length.
> 
> * A key may specify that it will match sequences with arbitrarily many
>   additional unspecified items.


Sounds like a regular expression. Remember that computer science theoretical
regular expressions don't just match strings, they can apply to any
sequence of primitive values.

In your case, you only need two special tokens, match-one and
match-one-or-more (which may be like r'.' and r'.*').

If you can guarantee that the key items have no spaces in them, you may even
be able to use a string re. Assemble the key from the tuple like so:

- replace ONE by "."
- replace ONE-OR-MORE by ".*"
- return " ".join(keyitems)


Otherwise, you can create your own sequence type and simple regex engine
(you only need two meta-items, ONE and ONE-OR-MORE) to perform equality
tests.

If you can use a string, make it a subclass with an __eq__ method that
returns re.match(key, self). Either way, you can then use your custom class
as the keys in a mapping type.

You can't use a dict for the mapping, not unless you're smarter than me, due
to the requirement to hash the keys.

You can use a simple list of tuples [(key, value), ...] if O(N) searches are
fast enough. (Surely they'll be fast enough for a proof of concept?) That
at least is guaranteed to work: when doing a lookup, you simply look at
every key until you find (at least one) that matches. If there are no
matches by the time you hit the end of the list, you know that there cannot
be any matches!

A more complex, but potentially faster for large N, method would be to keep
the list sorted and use a binary search (bisect module). Or use a binary
tree. The hard part is that for this to work, you need a well-defined
less-than operation as well as equality, and I'm not sure how to implement
that for ONE and ONE-OR-MORE. I'm sure it can be done though.

At a guess... make ONE less than every string, and every string less than
ONE-OR-MORE, and then just use regular tuple order comparisons.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#93897

FromBen Finney <ben+python@benfinney.id.au>
Date2015-07-16 15:53 +1000
Message-ID<mailman.557.1437026017.3674.python-list@python.org>
In reply to#93894
Steven D'Aprano <steve@pearwood.info> writes:

> Sounds like a regular expression. Remember that computer science
> theoretical regular expressions don't just match strings, they can
> apply to any sequence of primitive values.

Good insight, thank you.

> In your case, you only need two special tokens, match-one and
> match-one-or-more (which may be like r'.' and r'.*').

It actually needs “one” and “zero-or-more”. Which does match your
analogous regex ‘.’ and ‘.*’ patterns.

> Otherwise, you can create your own sequence type and simple regex
> engine (you only need two meta-items, [ONE and ZERO-OR-MORE]) to
> perform equality tests.

It may have to come to that. That's not such a problem, but:

> You can't use a dict for the mapping, not unless you're smarter than
> me, due to the requirement to hash the keys.

Dang. It's the mapping that I really need to solve, I think. A mapping
that has a custom “does this candidate match any existing key” and
“return the value for this key” to defer to the matching behaviour
described above.

Are those the ‘__contains__’, ‘__getitem__’ methods? What actually is
the API of a mapping type, that would need to be customised for this
application?

-- 
 \       “If we listen only to those who are like us, we will squander |
  `\   the great opportunity before us: To live together peacefully in |
_o__)            a world of unresolved differences.” —David Weinberger |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#93901

FromEthan Furman <ethan@stoneleaf.us>
Date2015-07-15 23:31 -0700
Message-ID<mailman.561.1437028279.3674.python-list@python.org>
In reply to#93894
On 07/15/2015 10:53 PM, Ben Finney wrote:
> Steven D'Aprano <steve@pearwood.info> writes:

>> You can't use a dict for the mapping, not unless you're smarter than
>> me, due to the requirement to hash the keys.
>
> Dang. It's the mapping that I really need to solve, I think. A mapping
> that has a custom “does this candidate match any existing key” and
> “return the value for this key” to defer to the matching behaviour
> described above.
>
> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually is
> the API of a mapping type, that would need to be customised for this
> application?

The problem is that potential key matches are found by hashes, and the hash of

   ('value1', ANY, 'next_value')

and

   ('value1, 'value2', 'next_value')

and

   ('value1', 'value3', 'next_value')

will not and cannot be the same.

If you fiddle with your object such that all instances hash to the same value then you lose the O(1) lookup for the dict -- you basically have a list.

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#93903

FromBen Finney <ben+python@benfinney.id.au>
Date2015-07-16 16:37 +1000
Message-ID<mailman.563.1437028680.3674.python-list@python.org>
In reply to#93894
Ethan Furman <ethan@stoneleaf.us> writes:

> On 07/15/2015 10:53 PM, Ben Finney wrote:
> > Are those the ‘__contains__’, ‘__getitem__’ methods? What actually
> > is the API of a mapping type, that would need to be customised for
> > this application?
>
> The problem is that potential key matches are found by hashes

For the Python ‘dict’ type, yes. I already know that I don't want that
type, I want a custom mapping type which matches keys according to the
algorithm I specify.

So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking
“how do I implement a custom type which can duck-type for ‘dict’ but
have a different key-lookup implementation?”.

-- 
 \       “As the most participatory form of mass speech yet developed, |
  `\    the Internet deserves the highest protection from governmental |
_o__)                   intrusion.” —U.S. District Court Judge Dalzell |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#93906

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-07-16 08:09 +0100
Message-ID<mailman.565.1437030570.3674.python-list@python.org>
In reply to#93894
On 16/07/2015 07:37, Ben Finney wrote:
> Ethan Furman <ethan@stoneleaf.us> writes:
>
>> On 07/15/2015 10:53 PM, Ben Finney wrote:
>>> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually
>>> is the API of a mapping type, that would need to be customised for
>>> this application?
>>
>> The problem is that potential key matches are found by hashes
>
> For the Python ‘dict’ type, yes. I already know that I don't want that
> type, I want a custom mapping type which matches keys according to the
> algorithm I specify.
>
> So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking
> “how do I implement a custom type which can duck-type for ‘dict’ but
> have a different key-lookup implementation?”.
>

https://docs.python.org/3/reference/datamodel.html?emulating-container-types 


<quote>
The collections module provides a MutableMapping abstract base class to 
help create those methods from a base set of __getitem__(), 
__setitem__(), __delitem__(), and keys()
</quote>

Hence 
https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping

Hunting for examples got me to 
http://code.activestate.com/recipes/578096-a-mutablemapping-that-can-use-unhashable-objects-a/ 
so fingers crossed, I'm just hoping that we're going in the right direction.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#93909

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-07-16 08:24 +0100
Message-ID<mailman.568.1437031467.3674.python-list@python.org>
In reply to#93894
On 16/07/2015 08:09, Mark Lawrence wrote:
> On 16/07/2015 07:37, Ben Finney wrote:
>> Ethan Furman <ethan@stoneleaf.us> writes:
>>
>>> On 07/15/2015 10:53 PM, Ben Finney wrote:
>>>> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually
>>>> is the API of a mapping type, that would need to be customised for
>>>> this application?
>>>
>>> The problem is that potential key matches are found by hashes
>>
>> For the Python ‘dict’ type, yes. I already know that I don't want that
>> type, I want a custom mapping type which matches keys according to the
>> algorithm I specify.
>>
>> So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking
>> “how do I implement a custom type which can duck-type for ‘dict’ but
>> have a different key-lookup implementation?”.
>>
>
> https://docs.python.org/3/reference/datamodel.html?emulating-container-types
>
>
> <quote>
> The collections module provides a MutableMapping abstract base class to
> help create those methods from a base set of __getitem__(),
> __setitem__(), __delitem__(), and keys()
> </quote>
>
> Hence
> https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping
>
>
> Hunting for examples got me to
> http://code.activestate.com/recipes/578096-a-mutablemapping-that-can-use-unhashable-objects-a/
> so fingers crossed, I'm just hoping that we're going in the right
> direction.
>

Drat, should have scrolled down one more page on the activestate recipe, 
Stephen D'Aprano comment.

<quote>
This is very ingenious, but I wouldn't want to touch it with a ten foot 
pole for production code
</quote>

Eric Snow replied.

<quote>
Oh yeah. I wouldn't use it in production either. :) I expect you could 
iron out the wrinkles, but there are better solutions.
</quote>

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#93905

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-07-16 09:53 +0300
Message-ID<871tg8bnk7.fsf@elektro.pacujo.net>
In reply to#93889
Ben Finney <ben+python@benfinney.id.au>:

> What well-defined data type exists with the following properties:
>
> * Mapping, key → value.
>
> * Each key is a sequence (e.g. `tuple`) of items such as text strings.
>
> * Items in a key may be the sentinel `ANY` value, which will match any
>   value at that position.
>
> * A key may specify that it will match *only* sequences of the same
>   length.
>
> * A key may specify that it will match sequences with arbitrarily many
>   additional unspecified items.

What you are describing is a tree; the sequences specify paths:

========================================================================
class Node:
    ANY = object()
    UNDEFINED = object()

    def __init__(self):
        self.value = Node.UNDEFINED
        self.children = {}

    def setvalue(self, keyseq, value, offset=0):
        try:
            next = keyseq[offset]
        except IndexError:
            self.value = value
            return
        if next is Node.ANY:
            raise KeyError()
        try:
            child = self.children[next]
        except KeyError:
            self.children[next] = child = Node()
        child.setvalue(keyseq, value, offset + 1)

    def lookup(self, keyseq, offset=0):
        try:
            next = keyseq[offset]
        except IndexError:
            for value in self.yieldall():
                yield value
            return
        if next is Node.ANY:
            for child in self.children.itervalues():
                for value in child.lookup(keyseq, offset + 1):
                    yield value
            return
        try:
            child = self.children[next]
        except KeyError:
            return
        for value in child.lookup(keyseq, offset + 1):
            yield value

    def yieldall(self):
        if self.value is not Node.UNDEFINED:
            yield self.value
        for child in self.children.itervalues():
            for value in child.yieldall():
                yield value
=====================================================================

Usage:

=====================================================================
tree = Node()
tree.setvalue((1, 2), 3)
print list(tree.lookup((1, 2)))
print list(tree.lookup((Node.ANY, 2)))
=====================================================================

CAVEAT: Code barely tested.


Marko

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web