Groups > comp.lang.python > #107012 > unrolled thread

Re: Looking for feedback on weighted voting algorithm

Started by	Michael Selik <michael.selik@gmail.com>
First post	2016-04-14 20:48 +0000
Last post	2016-04-16 00:35 +0000
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Looking for feedback on weighted voting algorithm Michael Selik <michael.selik@gmail.com> - 2016-04-14 20:48 +0000
    Re: Looking for feedback on weighted voting algorithm sohcahtoa82@gmail.com - 2016-04-15 10:52 -0700
      Re: Looking for feedback on weighted voting algorithm Michael Selik <michael.selik@gmail.com> - 2016-04-16 00:35 +0000

#107012 — Re: Looking for feedback on weighted voting algorithm

From	Michael Selik <michael.selik@gmail.com>
Date	2016-04-14 20:48 +0000
Subject	Re: Looking for feedback on weighted voting algorithm
Message-ID	<mailman.118.1460666904.15650.python-list@python.org>

On Thu, Apr 14, 2016, 7:37 PM justin walters <walters.justin01@gmail.com>
wrote:

> On Apr 14, 2016 9:41 AM, "Martin A. Brown" <martin@linux-ip.net> wrote:
> >
> >
> > Greetings Justin,
> >
> > >    score = sum_of_votes/num_of_votes
> >
> > >votes = [(72, 4), (96, 3), (48, 2), (53, 1), (26, 4), (31, 3), (68, 2),
> (91, 1)]
> >
> > >Specifically, I'm wondering if this is a good algorithm for
> > >weighted voting. Essentially a vote is weighted by the number of
> > >votes it counts as. I realize that this is an extremely simple
> > >algorithm, but I was wondering if anyone had suggestions on how to
> > >improve it.
> >
> > I snipped most of your code.  I don't see anything wrong with your
> > overall approach.  I will make one suggestion: watch out for
> > DivisionByZero.
> >
> >     try:
> >         score = sum_of_votes / num_of_votes
> >     except ZeroDivisionError:
> >         score = float('nan')
> >
> > In your example data, all of the weights were integers, which means
> > that a simple mean function would work, as well, if you expanded the
> > votes to an alternate representation:
> >
> >   votes = [72, 72, 72, 72, 96, 96, 96, 48, 48, 53, 26, 26, 26, 26,
> >            31, 31, 31, 68, 68, 91]
> >
> > But, don't bother!
> >
> > Your function can handle votes that have a float weight:
> >
> >   >>> weight([(4, 1.3), (1, 1),])
> >   2.695652173913044
> >
> > Have fun!
> >
> > -Martin
> >
> > --
> > Martin A. Brown
> > http://linux-ip.net/
>
> Thanks Martin!
>
> I'll add the check for division by zero. Didn't think about that. I think
> I'm going to sanitize input anyways, but always better to be safe than
> sorry.
>

I suggest not worrying about sanitizing inputs. If someone provides bad
data, Python will do the right thing: stop the program and print an
explanation of what went wrong, often a more helpful message than one you'd
write. Use error handling mostly for when you want to do something *other*
than stop the program.

I'm not sure I'd use NaN instead of raise division by zero error. NaNs can
be troublesome for downstream code that might not notice until it gets
confusing. A div-by-zero error is clear and easier to track down because of
the traceback.

What do you think of using list comprehensions?

    weighted_sum = sum(rating * weight for rating, weight in votes)
    total_weights = sum(weight for rating, weight in votes)
    score = weighted_sum / total_weights

It's two loops as I wrote it, which is instinctively slower, but it might
actually execute faster because of the built-in sum vs a regular for loop.
Not sure.

>

[toc] | [next] | [standalone]

#107068

From	sohcahtoa82@gmail.com
Date	2016-04-15 10:52 -0700
Message-ID	<3567e3a1-6a55-4471-ad70-dc23e9ad05f4@googlegroups.com>
In reply to	#107012

On Thursday, April 14, 2016 at 1:48:40 PM UTC-7, Michael Selik wrote:
> On Thu, Apr 14, 2016, 7:37 PM justin walters <walters.justin01@gmail.com>
> wrote:
> 
> > On Apr 14, 2016 9:41 AM, "Martin A. Brown" <martin@linux-ip.net> wrote:
> > >
> > >
> > > Greetings Justin,
> > >
> > > >    score = sum_of_votes/num_of_votes
> > >
> > > >votes = [(72, 4), (96, 3), (48, 2), (53, 1), (26, 4), (31, 3), (68, 2),
> > (91, 1)]
> > >
> > > >Specifically, I'm wondering if this is a good algorithm for
> > > >weighted voting. Essentially a vote is weighted by the number of
> > > >votes it counts as. I realize that this is an extremely simple
> > > >algorithm, but I was wondering if anyone had suggestions on how to
> > > >improve it.
> > >
> > > I snipped most of your code.  I don't see anything wrong with your
> > > overall approach.  I will make one suggestion: watch out for
> > > DivisionByZero.
> > >
> > >     try:
> > >         score = sum_of_votes / num_of_votes
> > >     except ZeroDivisionError:
> > >         score = float('nan')
> > >
> > > In your example data, all of the weights were integers, which means
> > > that a simple mean function would work, as well, if you expanded the
> > > votes to an alternate representation:
> > >
> > >   votes = [72, 72, 72, 72, 96, 96, 96, 48, 48, 53, 26, 26, 26, 26,
> > >            31, 31, 31, 68, 68, 91]
> > >
> > > But, don't bother!
> > >
> > > Your function can handle votes that have a float weight:
> > >
> > >   >>> weight([(4, 1.3), (1, 1),])
> > >   2.695652173913044
> > >
> > > Have fun!
> > >
> > > -Martin
> > >
> > > --
> > > Martin A. Brown
> > > http://linux-ip.net/
> >
> > Thanks Martin!
> >
> > I'll add the check for division by zero. Didn't think about that. I think
> > I'm going to sanitize input anyways, but always better to be safe than
> > sorry.
> >
> 
> I suggest not worrying about sanitizing inputs. If someone provides bad
> data, Python will do the right thing: stop the program and print an
> explanation of what went wrong, often a more helpful message than one you'd
> write. Use error handling mostly for when you want to do something *other*
> than stop the program.
> 
> I'm not sure I'd use NaN instead of raise division by zero error. NaNs can
> be troublesome for downstream code that might not notice until it gets
> confusing. A div-by-zero error is clear and easier to track down because of
> the traceback.
> 
> What do you think of using list comprehensions?
> 
>     weighted_sum = sum(rating * weight for rating, weight in votes)
>     total_weights = sum(weight for rating, weight in votes)
>     score = weighted_sum / total_weights
> 
> It's two loops as I wrote it, which is instinctively slower, but it might
> actually execute faster because of the built-in sum vs a regular for loop.
> Not sure.
> 
> >

I disagree with your notion of not sanitizing inputs.

If I'm the only person that will be using my program, then sure, I won't sanitize inputs.  It was my mistake and I should read the traceback and know what I did wrong.

But if ANY third party is going to be using it, especially as part of a web service of some kind, I'd much rather sanitize the inputs and tell the user they did something bad than to have the script crash and give a 500 Internal Server Error to the user.

Even worse, I definitely wouldn't want to give any hints of the source code organization by letting the user see the traceback.  I'll log it privately so I can view it myself, of course.

[toc] | [prev] | [next] | [standalone]

#107073

From	Michael Selik <michael.selik@gmail.com>
Date	2016-04-16 00:35 +0000
Message-ID	<mailman.33.1460766914.6324.python-list@python.org>
In reply to	#107068

On Fri, Apr 15, 2016, 7:56 PM <sohcahtoa82@gmail.com> wrote:

> On Thursday, April 14, 2016 at 1:48:40 PM UTC-7, Michael Selik wrote:
> > I suggest not worrying about sanitizing inputs. If someone provides bad
> > data, Python will do the right thing: stop the program and print an
> > explanation of what went wrong, often a more helpful message than one
> you'd
> > write. Use error handling mostly for when you want to do something
> *other*
> > than stop the program.
> >
> > I'm not sure I'd use NaN instead of raise division by zero error. NaNs
> can
> > be troublesome for downstream code that might not notice until it gets
> > confusing. A div-by-zero error is clear and easier to track down because
> of
> > the traceback.
>
> I'd much rather sanitize the inputs and tell the user they did something
> bad than to have the script crash and give a 500 Internal Server Error to
> the user.
>

Right, my advice was only good if the user can read the original error and
traceback. And can restart the program, etc.

Even worse, I definitely wouldn't want to give any hints of the source code
> organization by letting the user see the traceback.
>

Depends on the kind of user you have.

>

[toc] | [prev] | [standalone]

csiph-web

Re: Looking for feedback on weighted voting algorithm

Contents

#107012 — Re: Looking for feedback on weighted voting algorithm

#107068

#107073