Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Michael Selik Newsgroups: comp.lang.python Subject: Re: Drowning in a teacup? Date: Sat, 02 Apr 2016 14:36:45 +0000 Lines: 116 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de dG3aCI9ZCqJGLgekPIvutAVrMxYGpE5xHAF+wfZlgE1w== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'pop': 0.05; 'python3': 0.05; 'sys': 0.05; '__name__': 0.07; 'main()': 0.07; 'matches': 0.07; 'strings.': 0.07; 'indeed,': 0.09; 'invocation': 0.09; 'item,': 0.09; 'modifies': 0.09; 'suggestions.': 0.09; 'def': 0.13; 'suggest': 0.15; '>>>': 0.15; "'__main__':": 0.16; '...)': 0.16; '2016': 0.16; 'beginning.': 0.16; 'cares': 0.16; 'datasets': 0.16; 'deque': 0.16; 'invocations': 0.16; 'key?': 0.16; 'keyword,': 0.16; 'main():': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reverses': 0.16; 'sort()': 0.16; 'to:addr:web.de': 0.16; 'worst': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.16; 'case.': 0.18; 'element': 0.18; '>': 0.18; 'tests': 0.18; 'runs': 0.18; '>>>': 0.20; 'to:2**1': 0.21; 'constant': 0.22; 'pos': 0.22; 'elements': 0.23; 'insert': 0.23; "python's": 0.23; 'sat,': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'sort': 0.25; 'script': 0.25; 'skip:# 10': 0.27; 'message-id:@mail.gmail.com': 0.27; 'small,': 0.27; 'function': 0.28; "skip:' 10": 0.28; 'cat': 0.29; 'correct,': 0.29; 'index,': 0.29; 'sensible': 0.29; 'raise': 0.29; 'relative': 0.30; 'skip:[ 10': 0.31; "i'd": 0.31; 'option': 0.31; 'post': 0.31; "can't": 0.32; 'knows': 0.32; 'michael': 0.33; 'shift': 0.33; 'case,': 0.34; "skip:' 20": 0.34; 'add': 0.34; 'list': 0.34; 'received:google.com': 0.35; 'skip:3 10': 0.35; 'lists.': 0.35; 'mapping': 0.35; 'skip:. 20': 0.35; 'something': 0.35; 'item': 0.35; 'but': 0.36; 'list,': 0.36; 'skip:i 20': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'beginning': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'skip:& 10': 0.37; 'expect': 0.37; 'spread': 0.37; 'list.': 0.37; 'difference': 0.38; 'front': 0.38; 'received:209': 0.38; 'log': 0.38; 'skip:p 20': 0.38; 'copying': 0.38; 'skip:o 20': 0.38; 'test': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'your': 0.60; 'more': 0.63; 'between': 0.65; 'better.': 0.66; 'here': 0.66; 'results': 0.66; 'skip:\xc2 10': 0.67; '100': 0.79; 'benchmark': 0.84; 'conclusions': 0.84; 'liberty': 0.84; 'otten': 0.84; 'rises': 0.84; 'snap': 0.84; 'fare': 0.93; 'factors': 0.97 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=gMGbumhzE+TKewnVr+7lV0wA77ic1lxpvZJGrvwdHAQWnj6X12qDgwaaOyrnlooVH6 g7sPWKjypYD93h/mWGovq2fAN9xaTlfhCMlMQvvYhHL17/84BnQFCh6Fl5bZBclknYrx 6pCIHcXI3U33DC7h2y035Pak1BToglwFHyfylEfcJw1pEtPQRuk2jv5IY9jNZ5fwT+Yw EUlH1P5p0hwjCG6Ujxc8guT8Ggsew7UlJTJFCk+X2bJnbHbBvryC7trKD1JZIM+E/N0Y QKnH9EHVFUMeCDEWgEh73YARMgzOPz5aCW9kFB9NxPGZuBo35yf9jYsCs2T1VU1lgFco NZuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=DuI/sRvPtf3lNDsIV+3DrJy1Azf2g8vorp0rOHaenuf0XFgWcVdMxh30WUr0J6cF/d M5pVKLJMVveLt9uN5M2hKObTiShZ5RUVvh6YzGuSgk9YjWKkpWPCKe0FvcNuHdlIZufk hh3D39ANgLptTJ81HKb2fVHinmKhPjuJuqAkrsb9iQnc354Iu24NclE8P+a/nEB5zasd tjBx2f0Y8VRNW6MAzj5KZvyWGdkxQg3yYatajae2+ZB20FkuIgDD2BC5WEyDMKbJhaEc JAzvI6uPWykCA/CeCOG07leRRpjjXfY1fg5oRi7pIqCi+jz0GBRtEVbugfYOeov3oFBc xnEg== X-Gm-Message-State: AD7BkJJeKN7G4KrrVBL1hbfUcPicpGC/tpavwWeV4nMz6B8T3kDM2dzgZS2qtqWQwkgBafVE4iePaw5YTfIuwg== X-Received: by 10.140.237.204 with SMTP id i195mr13843006qhc.55.1459607814898; Sat, 02 Apr 2016 07:36:54 -0700 (PDT) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:106286 On Sat, Apr 2, 2016 at 6:32 AM Peter Otten <__peter__@web.de> wrote: > Vito De Tullio wrote: > > > Michael Selik wrote: > > > >>> > I need to scan a list of strings. If one of the elements matches the > >>> > beginning of a search keyword, that element needs to snap to the > front > >>> > of the list. > >>> > >>> I know this post regards the function passing, but, on you specific > >>> problem, > >>> can't you just ... sort the list with a custom key? > >> > >> If the number of matches is small relative to the size of the list, I'd > >> expect the sort would be slower than most of the other suggestions. > > > > umm... I take the liberty to set up a little test > > > > $ cat test_sort_prefix.py > > #!/usr/bin/env python3 > > > > from sys import argv > > from timeit import timeit > > > > > > def sort_in_place(l): > > def key(e): > > return not e.startswith('yes') > > l.sort(key=key) > > > > > > def original_solution(mylist): > > for i in range(len(mylist)): > > if(mylist[i].startswith('yes')): > > mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:] > > original_solution() reverses the order of the matches while sort_in_place() > doesn't. If the order is not relevant I'd try something like > > def exchange(items): > pos = 0 > for index, item in enumerate(items): > if item.startswith("yes"): > items[pos], items[index] = item, items[pos] > pos += 1 > > > def main(): > > if argv[1] == 'sort_in_place': > > f = sort_in_place > > elif argv[1] == 'original_solution': > > f = original_solution > > else: > > raise Exception() > > > > nomatches = int(argv[2]) > > matches = int(argv[3]) > > > > l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in > > range(matches)] > > Python's timsort knows how to deal with (partially) sorted lists. I suggest > that you add > > random.seed(42) # same list for all script invocations > random.shuffle(l) > > here to spread the matches somewhat over the list. > > > print(timeit("f(l)", number=100, globals={"f": f, "l": l})) > > Remember that f(l) modifies l, so all but the first invocation will see a > list where all matches are at the beginning. In other words: in 99 out of > 100 runs the list is already sorted. > > While adding the overhead of copying the list I would expect the results of > > timeit("f(l[:])", ...) > > to be more realistic. > > > if __name__ == '__main__': > > main() > > $ ./test_sort_prefix.py sort_in_place 1000000 3 > > 33.260575089996564 > > $ ./test_sort_prefix.py original_solution 1000000 3 > > 35.93009542999789 > > $ > > > > > > in my tests there is no sensible difference between the two solutions... > > and the number of matches is risible :) > > Indeed, as the number of matches rises your sort() approach will likely > fare > better. > > Your conclusions may be correct, but the benchmark to support them is > flawed. > > Option 1: scan and match, pop, insert Scan and match is O(n) pop is O(n) ... gotta shift everything over one after deleting insert is O(n) on a list, better use deque.appendleft for O(1) Option 2: map keyfunc, sort Mapping the keyfunction is O(n) (tim)Sorting is O(n log n) worst case, O(n) best case. Option 1 using a deque will be O(n). Option 2 will be O(n log n) worst case and O(n) best case. On small datasets the constant factors matter, but then it's small, so who cares about speed? (partially joking)