Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93889 > unrolled thread
| Started by | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| First post | 2015-07-16 11:51 +1000 |
| Last post | 2015-07-16 09:53 +0300 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 11:51 +1000
Re: Mapping, with sequence as key, wildcard and subsequence matching Steven D'Aprano <steve@pearwood.info> - 2015-07-16 14:01 +1000
Re: Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 15:53 +1000
Re: Mapping, with sequence as key, wildcard and subsequence matching Ethan Furman <ethan@stoneleaf.us> - 2015-07-15 23:31 -0700
Re: Mapping, with sequence as key, wildcard and subsequence matching Ben Finney <ben+python@benfinney.id.au> - 2015-07-16 16:37 +1000
Re: Mapping, with sequence as key, wildcard and subsequence matching Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-16 08:09 +0100
Re: Mapping, with sequence as key, wildcard and subsequence matching Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-16 08:24 +0100
Re: Mapping, with sequence as key, wildcard and subsequence matching Marko Rauhamaa <marko@pacujo.net> - 2015-07-16 09:53 +0300
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2015-07-16 11:51 +1000 |
| Subject | Mapping, with sequence as key, wildcard and subsequence matching |
| Message-ID | <mailman.552.1437011478.3674.python-list@python.org> |
Howdy all,
What well-defined data type exists with the following properties:
* Mapping, key → value.
* Each key is a sequence (e.g. `tuple`) of items such as text strings.
* Items in a key may be the sentinel `ANY` value, which will match any
value at that position.
* A key may specify that it will match *only* sequences of the same
length.
* A key may specify that it will match sequences with arbitrarily many
additional unspecified items.
If it matters: I'm using Python 2 to implement this.
The “match any item” type is simple enough; I'm aware of the
`unittest.mock.ANY` sentinel, and that meets my needs.
What I'm hoping to find is a general, existing, solution to the problem
of a mapping that uses keys which, in their value, encapsulate the fact
that the key does, or does not, match sequences of arbitrary length
longer than itself.
Example of what I'd like::
from unittest import mock
responses = SubsequenceMatchingType([
('foo', 'bar', 'baz'):
"Simple three-argument command.",
('wibble', mock.ANY, 'wubble'):
"Second argument ANY.",
('lorem', 'ipsum'):
"Arbitrary trailing arguments.",
])
responses[('foo', 'bar')] # raises KeyError
responses[('foo', 'bar', 'baz')] # → "Simple three-argument command."
responses[('foo', 'bar', 'baz', 'bork')] # raises KeyError
responses[('wibble', 'wobble')] # raises KeyError
responses[('wibble', 'wobble', 'wubble')] # → "Second argument ANY."
responses[('wibble', 'wobble', 'baz')] # raises KeyError
responses[('wibble', 'wobble', 'wubble', 'baz')] # raises KeyError
responses[('lorem',)] # raises KeyError
responses[('lorem', 'ipsum')] # → "Arbitrary trailing arguments."
responses[('lorem', 'ipsum', 'dolor', 'sit', 'amet')] # → "Arbitrary trailing arguments."
So some kind of distinguishing feature needs to be added to those keys
which should match arbitrary trailing items: not ‘('foo', 'bar', 'baz')’
but ‘('lorem', 'ipsum')’.
The trouble I'm having is that the keys in the mapping, and the
candidate sequences, are simple tuples. Adding a sentinel item at the
end would make it a different sequence, and indistinguishable from
specifying an additional item at that position.
Is this a problem already solved (and implemented, and debugged!) in a
general form that I can use?
--
\ “I am too firm in my consciousness of the marvelous to be ever |
`\ fascinated by the mere supernatural …” —Joseph Conrad, _The |
_o__) Shadow-Line_ |
Ben Finney
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-07-16 14:01 +1000 |
| Message-ID | <55a72c8b$0$1671$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #93889 |
On Thu, 16 Jul 2015 11:51 am, Ben Finney wrote: > Howdy all, > > What well-defined data type exists with the following properties: > > * Mapping, key → value. > > * Each key is a sequence (e.g. `tuple`) of items such as text strings. > > * Items in a key may be the sentinel `ANY` value, which will match any > value at that position. > > * A key may specify that it will match *only* sequences of the same > length. > > * A key may specify that it will match sequences with arbitrarily many > additional unspecified items. Sounds like a regular expression. Remember that computer science theoretical regular expressions don't just match strings, they can apply to any sequence of primitive values. In your case, you only need two special tokens, match-one and match-one-or-more (which may be like r'.' and r'.*'). If you can guarantee that the key items have no spaces in them, you may even be able to use a string re. Assemble the key from the tuple like so: - replace ONE by "." - replace ONE-OR-MORE by ".*" - return " ".join(keyitems) Otherwise, you can create your own sequence type and simple regex engine (you only need two meta-items, ONE and ONE-OR-MORE) to perform equality tests. If you can use a string, make it a subclass with an __eq__ method that returns re.match(key, self). Either way, you can then use your custom class as the keys in a mapping type. You can't use a dict for the mapping, not unless you're smarter than me, due to the requirement to hash the keys. You can use a simple list of tuples [(key, value), ...] if O(N) searches are fast enough. (Surely they'll be fast enough for a proof of concept?) That at least is guaranteed to work: when doing a lookup, you simply look at every key until you find (at least one) that matches. If there are no matches by the time you hit the end of the list, you know that there cannot be any matches! A more complex, but potentially faster for large N, method would be to keep the list sorted and use a binary search (bisect module). Or use a binary tree. The hard part is that for this to work, you need a well-defined less-than operation as well as equality, and I'm not sure how to implement that for ONE and ONE-OR-MORE. I'm sure it can be done though. At a guess... make ONE less than every string, and every string less than ONE-OR-MORE, and then just use regular tuple order comparisons. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2015-07-16 15:53 +1000 |
| Message-ID | <mailman.557.1437026017.3674.python-list@python.org> |
| In reply to | #93894 |
Steven D'Aprano <steve@pearwood.info> writes: > Sounds like a regular expression. Remember that computer science > theoretical regular expressions don't just match strings, they can > apply to any sequence of primitive values. Good insight, thank you. > In your case, you only need two special tokens, match-one and > match-one-or-more (which may be like r'.' and r'.*'). It actually needs “one” and “zero-or-more”. Which does match your analogous regex ‘.’ and ‘.*’ patterns. > Otherwise, you can create your own sequence type and simple regex > engine (you only need two meta-items, [ONE and ZERO-OR-MORE]) to > perform equality tests. It may have to come to that. That's not such a problem, but: > You can't use a dict for the mapping, not unless you're smarter than > me, due to the requirement to hash the keys. Dang. It's the mapping that I really need to solve, I think. A mapping that has a custom “does this candidate match any existing key” and “return the value for this key” to defer to the matching behaviour described above. Are those the ‘__contains__’, ‘__getitem__’ methods? What actually is the API of a mapping type, that would need to be customised for this application? -- \ “If we listen only to those who are like us, we will squander | `\ the great opportunity before us: To live together peacefully in | _o__) a world of unresolved differences.” —David Weinberger | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2015-07-15 23:31 -0700 |
| Message-ID | <mailman.561.1437028279.3674.python-list@python.org> |
| In reply to | #93894 |
On 07/15/2015 10:53 PM, Ben Finney wrote:
> Steven D'Aprano <steve@pearwood.info> writes:
>> You can't use a dict for the mapping, not unless you're smarter than
>> me, due to the requirement to hash the keys.
>
> Dang. It's the mapping that I really need to solve, I think. A mapping
> that has a custom “does this candidate match any existing key” and
> “return the value for this key” to defer to the matching behaviour
> described above.
>
> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually is
> the API of a mapping type, that would need to be customised for this
> application?
The problem is that potential key matches are found by hashes, and the hash of
('value1', ANY, 'next_value')
and
('value1, 'value2', 'next_value')
and
('value1', 'value3', 'next_value')
will not and cannot be the same.
If you fiddle with your object such that all instances hash to the same value then you lose the O(1) lookup for the dict -- you basically have a list.
--
~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2015-07-16 16:37 +1000 |
| Message-ID | <mailman.563.1437028680.3674.python-list@python.org> |
| In reply to | #93894 |
Ethan Furman <ethan@stoneleaf.us> writes: > On 07/15/2015 10:53 PM, Ben Finney wrote: > > Are those the ‘__contains__’, ‘__getitem__’ methods? What actually > > is the API of a mapping type, that would need to be customised for > > this application? > > The problem is that potential key matches are found by hashes For the Python ‘dict’ type, yes. I already know that I don't want that type, I want a custom mapping type which matches keys according to the algorithm I specify. So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking “how do I implement a custom type which can duck-type for ‘dict’ but have a different key-lookup implementation?”. -- \ “As the most participatory form of mass speech yet developed, | `\ the Internet deserves the highest protection from governmental | _o__) intrusion.” —U.S. District Court Judge Dalzell | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-07-16 08:09 +0100 |
| Message-ID | <mailman.565.1437030570.3674.python-list@python.org> |
| In reply to | #93894 |
On 16/07/2015 07:37, Ben Finney wrote: > Ethan Furman <ethan@stoneleaf.us> writes: > >> On 07/15/2015 10:53 PM, Ben Finney wrote: >>> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually >>> is the API of a mapping type, that would need to be customised for >>> this application? >> >> The problem is that potential key matches are found by hashes > > For the Python ‘dict’ type, yes. I already know that I don't want that > type, I want a custom mapping type which matches keys according to the > algorithm I specify. > > So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking > “how do I implement a custom type which can duck-type for ‘dict’ but > have a different key-lookup implementation?”. > https://docs.python.org/3/reference/datamodel.html?emulating-container-types <quote> The collections module provides a MutableMapping abstract base class to help create those methods from a base set of __getitem__(), __setitem__(), __delitem__(), and keys() </quote> Hence https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping Hunting for examples got me to http://code.activestate.com/recipes/578096-a-mutablemapping-that-can-use-unhashable-objects-a/ so fingers crossed, I'm just hoping that we're going in the right direction. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-07-16 08:24 +0100 |
| Message-ID | <mailman.568.1437031467.3674.python-list@python.org> |
| In reply to | #93894 |
On 16/07/2015 08:09, Mark Lawrence wrote: > On 16/07/2015 07:37, Ben Finney wrote: >> Ethan Furman <ethan@stoneleaf.us> writes: >> >>> On 07/15/2015 10:53 PM, Ben Finney wrote: >>>> Are those the ‘__contains__’, ‘__getitem__’ methods? What actually >>>> is the API of a mapping type, that would need to be customised for >>>> this application? >>> >>> The problem is that potential key matches are found by hashes >> >> For the Python ‘dict’ type, yes. I already know that I don't want that >> type, I want a custom mapping type which matches keys according to the >> algorithm I specify. >> >> So, I'm not asking “how do I make ‘dict’ do this?”. I am instead asking >> “how do I implement a custom type which can duck-type for ‘dict’ but >> have a different key-lookup implementation?”. >> > > https://docs.python.org/3/reference/datamodel.html?emulating-container-types > > > <quote> > The collections module provides a MutableMapping abstract base class to > help create those methods from a base set of __getitem__(), > __setitem__(), __delitem__(), and keys() > </quote> > > Hence > https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping > > > Hunting for examples got me to > http://code.activestate.com/recipes/578096-a-mutablemapping-that-can-use-unhashable-objects-a/ > so fingers crossed, I'm just hoping that we're going in the right > direction. > Drat, should have scrolled down one more page on the activestate recipe, Stephen D'Aprano comment. <quote> This is very ingenious, but I wouldn't want to touch it with a ten foot pole for production code </quote> Eric Snow replied. <quote> Oh yeah. I wouldn't use it in production either. :) I expect you could iron out the wrinkles, but there are better solutions. </quote> -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2015-07-16 09:53 +0300 |
| Message-ID | <871tg8bnk7.fsf@elektro.pacujo.net> |
| In reply to | #93889 |
Ben Finney <ben+python@benfinney.id.au>:
> What well-defined data type exists with the following properties:
>
> * Mapping, key → value.
>
> * Each key is a sequence (e.g. `tuple`) of items such as text strings.
>
> * Items in a key may be the sentinel `ANY` value, which will match any
> value at that position.
>
> * A key may specify that it will match *only* sequences of the same
> length.
>
> * A key may specify that it will match sequences with arbitrarily many
> additional unspecified items.
What you are describing is a tree; the sequences specify paths:
========================================================================
class Node:
ANY = object()
UNDEFINED = object()
def __init__(self):
self.value = Node.UNDEFINED
self.children = {}
def setvalue(self, keyseq, value, offset=0):
try:
next = keyseq[offset]
except IndexError:
self.value = value
return
if next is Node.ANY:
raise KeyError()
try:
child = self.children[next]
except KeyError:
self.children[next] = child = Node()
child.setvalue(keyseq, value, offset + 1)
def lookup(self, keyseq, offset=0):
try:
next = keyseq[offset]
except IndexError:
for value in self.yieldall():
yield value
return
if next is Node.ANY:
for child in self.children.itervalues():
for value in child.lookup(keyseq, offset + 1):
yield value
return
try:
child = self.children[next]
except KeyError:
return
for value in child.lookup(keyseq, offset + 1):
yield value
def yieldall(self):
if self.value is not Node.UNDEFINED:
yield self.value
for child in self.children.itervalues():
for value in child.yieldall():
yield value
=====================================================================
Usage:
=====================================================================
tree = Node()
tree.setvalue((1, 2), 3)
print list(tree.lookup((1, 2)))
print list(tree.lookup((Node.ANY, 2)))
=====================================================================
CAVEAT: Code barely tested.
Marko
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web