Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <1404135829.10485.136092537.2F0486D2@webmail.messagingengine.com>
References: <1404135829.10485.136092537.2F0486D2@webmail.messagingengine.com>
Date: Tue, 1 Jul 2014 00:28:56 +1000
Subject: Re: Creating a dict-like class that counts successful and failed key matches
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.11344.1404138539.18130.python-list@python.org>
Lines: 80
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:73749

On Mon, Jun 30, 2014 at 11:43 PM,  <python@bdurham.com> wrote:
> As a diagnostic tool, I would like to create a dict-like class that counts
> successful and failed key matches by key. By failed I mean in the sense that
> a default value was returned vs. an exception raised. By count, I mean by
> tracking counts for individual keys vs. just total success/failure counts.
> The class needs to support setting values, retrieving values, and retrieving
> keys, items, and key/item pairs. Basically anything that a regular dict, I'd
> like my modified class to do as well.

Sounds like you want to subclass dict, then. Something like this:

class StatsDict(dict):
    def __init__(self, *a, **ka):
        super().__init__(*a, **ka)
        self.success = defaultdict(int)
        self.fail = defaultdict(int)
    def __getitem__(self, item):
        try:
            ret = super().__getitem__(item)
            self.success[item] += 1
            return ret
        except KeyError:
            self.fail[item] += 1
            raise

On initialization, set up some places for keeping track of stats. On
item retrieval (I presume you're not also looking for stats on item
assignment - for that, you'd want to also override __setitem__),
increment either the success marker or the fail marker for that key,
based exactly on what you say: was something returned, or was an
exception raised.

To get the stats, just look at the success and fail members:

>>> d = StatsDict()
>>> d["foo"]=1234
>>> d["foo"]
1234
>>> d["spam"]
(chomp)
KeyError: 'spam'
>>> d["foo"]
1234
>>> d["foo"]
1234
>>> d["test"]
(chomp)
KeyError: 'test'
>>> len(d.success) # Unique successful keys
1
>>> len(d.fail) # Unique failed keys
2
>>> sum(d.success.values()) # Total successful lookups
3
>>> sum(d.fail.values()) # Total unsuccessful lookups
2

You can also interrogate the actual defaultdicts, eg to find the hottest N keys.

For everything other than simple key retrieval, this should function
identically to a regular dict. Its repr will be a dict's repr, its
iteration will be its keys, all its other methods will be available
and won't be affected by this change. Notably, the .get() method isn't
logged; if you use that and want to get stats for it, you'll have to
reimplement it - something like this:

    def get(self, k, d=None):
        try:
            return self[k]
        except KeyError:
            return d

The lookup self[k] handles the statisticking, but if you let this go
through to the dict implementation of get(), it seems to ignore
__getitem__.

This probably isn't exactly what you want, but it's a start, at least,
and something to tweak into submission :)

ChrisA