Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Michael Selik Newsgroups: comp.lang.python Subject: Re: Looking for feedback on weighted voting algorithm Date: Thu, 14 Apr 2016 20:48:12 +0000 Lines: 79 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de dxH5ycxlbVgsWTyiZ8gg0AAjdCKYzqQ+vXcQX40trxGw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'essentially': 0.04; 'float': 0.05; '(1,': 0.09; '3),': 0.09; 'loop.': 0.09; 'wrong,': 0.09; 'zero.': 0.09; 'python': 0.10; 'suggest': 0.15; 'thu,': 0.15; '*other*': 0.16; '1),': 0.16; '2),': 0.16; '2016': 0.16; '4),': 0.16; 'anyways,': 0.16; 'downstream': 0.16; 'integers,': 0.16; 'justin': 0.16; 'loops': 0.16; 'nan': 0.16; 'nans': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sure.': 0.16; 'traceback.': 0.16; 'write.': 0.16; 'wrote:': 0.16; 'alternate': 0.18; 'expanded': 0.18; 'try:': 0.18; '>': 0.18; 'email addr:gmail.com>': 0.18; 'input': 0.18; '>>>': 0.20; 'algorithm': 0.20; 'work,': 0.21; 'to:2**1': 0.21; 'martin': 0.22; '31,': 0.22; 'am,': 0.23; 'code.': 0.23; 'wrote': 0.23; 'header :In-Reply-To:1': 0.24; 'wondering': 0.25; 'example': 0.26; 'helpful': 0.27; 'error': 0.27; 'handling': 0.27; 'mostly': 0.27; 'message-id:@mail.gmail.com': 0.27; '14,': 0.27; 'data,': 0.27; 'function': 0.28; 'went': 0.28; 'regular': 0.29; 'division': 0.29; 'voting.': 0.29; 'raise': 0.29; "i'm": 0.30; 'print': 0.30; 'that.': 0.30; 'code': 0.30; "i'd": 0.31; 'error.': 0.31; 'anyone': 0.32; 'realize': 0.32; "i'll": 0.33; 'except': 0.34; 'handle': 0.34; 'add': 0.34; 'list': 0.34; 'gets': 0.35; 'received:google.com': 0.35; 'clear': 0.35; 'skip:> 10': 0.35; 'something': 0.35; 'received:74.125.82': 0.35; 'but': 0.36; 'instead': 0.36; 'faster': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'skip:& 10': 0.37; 'thanks': 0.37; 'skip:z 10': 0.38; 'wrong': 0.38; 'anything': 0.38; 'someone': 0.38; 'mean': 0.38; 'means': 0.39; 'sure': 0.39; "didn't": 0.39; 'to:addr:python.org': 0.40; 'your': 0.60; 'watch': 0.62; 'more': 0.63; 'safe': 0.63; 'skip:\xc2 10': 0.67; 'sum': 0.69; 'greetings': 0.71; '26,': 0.72; 'overall': 0.72; 'score': 0.76; 'counts': 0.81; 'walters': 0.84; 'approach.': 0.91; 'sorry.': 0.91; 'fun!': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=QseBneF/WR3FQIwYcdBGYMC1jk9dBfLeJysIhCjQ9ng=; b=iwc/pb2c03m7GDwZne3priMzpsNI1PhJGjVI7ZV76n//iiansT812wzCDHL55d/1Fn +ONe0mmTreD3TbjNb2dKPY6K6rVwGzdNdjbJ5jxx6i5EOaQglyT1i/W6XVzS4KMbQksu 7l1palEQsIUu2x6kqrSwMcJeyUM0EoYQlm6ZxktFfS7E050YAR53GWtDSETD9Lu6fk38 8tI1B2rRAzojVUgZJkgywrDxEnVH/3+EjpZc041q6rbB/g49JJ2wtYgIf7j3pRAN3aEq Uqyt57ETDn88BJVUOMN16aILi0p4ZZ/R5hCa6/uYe39MyjoaKQZzNWxAWh0F682LskhR idMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=QseBneF/WR3FQIwYcdBGYMC1jk9dBfLeJysIhCjQ9ng=; b=ABqE2G27gufA/+JUgSZJuNtSU1ZiYpnibQcptANdg0Agkh3ImTD/MevzFJIfUkpY2M 640RNbtFImlKA/sHW1K2FiTBHP69c4nCyW5+ZCBCgQVpIsNDmda6r08qiw5Rj3FKNJzY 75/VdoLa+Nwl8lis+IOBJxj/zHPkOV5hodgDwDQ6BB/myq95LpfiSCkT+gG9sBM7u+xR LwlCICyQiMPy+NYbatIiDIJiNALOCp9FuCvkAaK9a8fKzfV+8p0U6iYJQA8b/9zBV5Dw UQZ+mR2Sx9mqJov7tUy56UeX5iL4d++YNpK7lVKfQahKNKlNxU14nfuofHc27kgn+4JZ CA/g== X-Gm-Message-State: AOPr4FWGT3MLVq7SSpChRth1NjkLnYlnFlsS9Xa782ewCvuuSkYLaXjGXV7unBaxMOSs5q4Dm/gxKuAvRm2j1w== X-Received: by 10.194.202.168 with SMTP id kj8mr17018804wjc.8.1460666902205; Thu, 14 Apr 2016 13:48:22 -0700 (PDT) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: Xref: csiph.com comp.lang.python:107012 On Thu, Apr 14, 2016, 7:37 PM justin walters wrote: > On Apr 14, 2016 9:41 AM, "Martin A. Brown" wrote: > > > > > > Greetings Justin, > > > > > score = sum_of_votes/num_of_votes > > > > >votes = [(72, 4), (96, 3), (48, 2), (53, 1), (26, 4), (31, 3), (68, 2), > (91, 1)] > > > > >Specifically, I'm wondering if this is a good algorithm for > > >weighted voting. Essentially a vote is weighted by the number of > > >votes it counts as. I realize that this is an extremely simple > > >algorithm, but I was wondering if anyone had suggestions on how to > > >improve it. > > > > I snipped most of your code. I don't see anything wrong with your > > overall approach. I will make one suggestion: watch out for > > DivisionByZero. > > > > try: > > score = sum_of_votes / num_of_votes > > except ZeroDivisionError: > > score = float('nan') > > > > In your example data, all of the weights were integers, which means > > that a simple mean function would work, as well, if you expanded the > > votes to an alternate representation: > > > > votes = [72, 72, 72, 72, 96, 96, 96, 48, 48, 53, 26, 26, 26, 26, > > 31, 31, 31, 68, 68, 91] > > > > But, don't bother! > > > > Your function can handle votes that have a float weight: > > > > >>> weight([(4, 1.3), (1, 1),]) > > 2.695652173913044 > > > > Have fun! > > > > -Martin > > > > -- > > Martin A. Brown > > http://linux-ip.net/ > > Thanks Martin! > > I'll add the check for division by zero. Didn't think about that. I think > I'm going to sanitize input anyways, but always better to be safe than > sorry. > I suggest not worrying about sanitizing inputs. If someone provides bad data, Python will do the right thing: stop the program and print an explanation of what went wrong, often a more helpful message than one you'd write. Use error handling mostly for when you want to do something *other* than stop the program. I'm not sure I'd use NaN instead of raise division by zero error. NaNs can be troublesome for downstream code that might not notice until it gets confusing. A div-by-zero error is clear and easier to track down because of the traceback. What do you think of using list comprehensions? weighted_sum = sum(rating * weight for rating, weight in votes) total_weights = sum(weight for rating, weight in votes) score = weighted_sum / total_weights It's two loops as I wrote it, which is instinctively slower, but it might actually execute faster because of the built-in sum vs a regular for loop. Not sure. >