Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #106286

Re: Drowning in a teacup?

Path csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From Michael Selik <michael.selik@gmail.com>
Newsgroups comp.lang.python
Subject Re: Drowning in a teacup?
Date Sat, 02 Apr 2016 14:36:45 +0000
Lines 116
Message-ID <mailman.363.1459607817.28225.python-list@python.org> (permalink)
References <ndmlj2$pb0$1@gioia.aioe.org> <ndnm9v$j07$1@ger.gmane.org> <CAGgTfkO4HuzoRBgrGJ3ccqcD+VB4_7O-g19NUk9FaYSBpQ2HkA@mail.gmail.com> <ndnomv$mf5$1@ger.gmane.org> <ndo722$rv4$1@ger.gmane.org>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
X-Trace news.uni-berlin.de dG3aCI9ZCqJGLgekPIvutAVrMxYGpE5xHAF+wfZlgE1w==
Return-Path <michael.selik@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'pop': 0.05; 'python3': 0.05; 'sys': 0.05; '__name__': 0.07; 'main()': 0.07; 'matches': 0.07; 'strings.': 0.07; 'indeed,': 0.09; 'invocation': 0.09; 'item,': 0.09; 'modifies': 0.09; 'suggestions.': 0.09; 'def': 0.13; 'suggest': 0.15; '&gt;&gt;&gt;': 0.15; "'__main__':": 0.16; '...)': 0.16; '2016': 0.16; 'beginning.': 0.16; 'cares': 0.16; 'datasets': 0.16; 'deque': 0.16; 'invocations': 0.16; 'key?': 0.16; 'keyword,': 0.16; 'main():': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reverses': 0.16; 'sort()': 0.16; 'to:addr:web.de': 0.16; 'worst': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.16; 'case.': 0.18; 'element': 0.18; '&gt;': 0.18; 'tests': 0.18; 'runs': 0.18; '>>>': 0.20; 'to:2**1': 0.21; 'constant': 0.22; 'pos': 0.22; 'elements': 0.23; 'insert': 0.23; "python's": 0.23; 'sat,': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'sort': 0.25; 'script': 0.25; 'skip:# 10': 0.27; 'message-id:@mail.gmail.com': 0.27; 'small,': 0.27; 'function': 0.28; "skip:' 10": 0.28; 'cat': 0.29; 'correct,': 0.29; 'index,': 0.29; 'sensible': 0.29; 'raise': 0.29; 'relative': 0.30; 'skip:[ 10': 0.31; "i'd": 0.31; 'option': 0.31; 'post': 0.31; "can't": 0.32; 'knows': 0.32; 'michael': 0.33; 'shift': 0.33; 'case,': 0.34; "skip:' 20": 0.34; 'add': 0.34; 'list': 0.34; 'received:google.com': 0.35; 'skip:3 10': 0.35; 'lists.': 0.35; 'mapping': 0.35; 'skip:. 20': 0.35; 'something': 0.35; 'item': 0.35; 'but': 0.36; 'list,': 0.36; 'skip:i 20': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'beginning': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'skip:& 10': 0.37; 'expect': 0.37; 'spread': 0.37; 'list.': 0.37; 'difference': 0.38; 'front': 0.38; 'received:209': 0.38; 'log': 0.38; 'skip:p 20': 0.38; 'copying': 0.38; 'skip:o 20': 0.38; 'test': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'your': 0.60; 'more': 0.63; 'between': 0.65; 'better.': 0.66; 'here': 0.66; 'results': 0.66; 'skip:\xc2 10': 0.67; '100': 0.79; 'benchmark': 0.84; 'conclusions': 0.84; 'liberty': 0.84; 'otten': 0.84; 'rises': 0.84; 'snap': 0.84; 'fare': 0.93; 'factors': 0.97
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=gMGbumhzE+TKewnVr+7lV0wA77ic1lxpvZJGrvwdHAQWnj6X12qDgwaaOyrnlooVH6 g7sPWKjypYD93h/mWGovq2fAN9xaTlfhCMlMQvvYhHL17/84BnQFCh6Fl5bZBclknYrx 6pCIHcXI3U33DC7h2y035Pak1BToglwFHyfylEfcJw1pEtPQRuk2jv5IY9jNZ5fwT+Yw EUlH1P5p0hwjCG6Ujxc8guT8Ggsew7UlJTJFCk+X2bJnbHbBvryC7trKD1JZIM+E/N0Y QKnH9EHVFUMeCDEWgEh73YARMgzOPz5aCW9kFB9NxPGZuBo35yf9jYsCs2T1VU1lgFco NZuQ==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=DuI/sRvPtf3lNDsIV+3DrJy1Azf2g8vorp0rOHaenuf0XFgWcVdMxh30WUr0J6cF/d M5pVKLJMVveLt9uN5M2hKObTiShZ5RUVvh6YzGuSgk9YjWKkpWPCKe0FvcNuHdlIZufk hh3D39ANgLptTJ81HKb2fVHinmKhPjuJuqAkrsb9iQnc354Iu24NclE8P+a/nEB5zasd tjBx2f0Y8VRNW6MAzj5KZvyWGdkxQg3yYatajae2+ZB20FkuIgDD2BC5WEyDMKbJhaEc JAzvI6uPWykCA/CeCOG07leRRpjjXfY1fg5oRi7pIqCi+jz0GBRtEVbugfYOeov3oFBc xnEg==
X-Gm-Message-State AD7BkJJeKN7G4KrrVBL1hbfUcPicpGC/tpavwWeV4nMz6B8T3kDM2dzgZS2qtqWQwkgBafVE4iePaw5YTfIuwg==
X-Received by 10.140.237.204 with SMTP id i195mr13843006qhc.55.1459607814898; Sat, 02 Apr 2016 07:36:54 -0700 (PDT)
In-Reply-To <ndo722$rv4$1@ger.gmane.org>
X-Content-Filtered-By Mailman/MimeDel 2.1.21
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:106286

Show key headers only | View raw


On Sat, Apr 2, 2016 at 6:32 AM Peter Otten <__peter__@web.de> wrote:

> Vito De Tullio wrote:
>
> > Michael Selik wrote:
> >
> >>> > I need to scan a list of strings. If one of the elements matches the
> >>> > beginning of a search keyword, that element needs to snap to the
> front
> >>> > of the list.
> >>>
> >>> I know this post regards the function passing, but, on you specific
> >>> problem,
> >>> can't you just ... sort the list with a custom key?
> >>
> >> If the number of matches is small relative to the size of the list, I'd
> >> expect the sort would be slower than most of the other suggestions.
> >
> > umm... I take the liberty to set up a little test
> >
> >     $ cat test_sort_prefix.py
> >     #!/usr/bin/env python3
> >
> >     from sys import argv
> >     from timeit import timeit
> >
> >
> >     def sort_in_place(l):
> >         def key(e):
> >             return not e.startswith('yes')
> >         l.sort(key=key)
> >
> >
> >     def original_solution(mylist):
> >         for i in range(len(mylist)):
> >             if(mylist[i].startswith('yes')):
> >                 mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:]
>
> original_solution() reverses the order of the matches while sort_in_place()
> doesn't. If the order is not relevant I'd try something like
>
> def exchange(items):
>     pos = 0
>     for index, item in enumerate(items):
>         if item.startswith("yes"):
>             items[pos], items[index] = item, items[pos]
>             pos += 1
>
> >     def main():
> >         if argv[1] == 'sort_in_place':
> >             f = sort_in_place
> >         elif argv[1] == 'original_solution':
> >             f = original_solution
> >         else:
> >             raise Exception()
> >
> >         nomatches = int(argv[2])
> >         matches = int(argv[3])
> >
> >         l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in
> > range(matches)]
>
> Python's timsort knows how to deal with (partially) sorted lists. I suggest
> that you add
>
> random.seed(42) # same list for all script invocations
> random.shuffle(l)
>
> here to spread the matches somewhat over the list.
>
> >         print(timeit("f(l)", number=100, globals={"f": f, "l": l}))
>
> Remember that f(l) modifies l, so all but the first invocation will see a
> list where all matches are at the beginning. In other words: in 99 out of
> 100 runs the list is already sorted.
>
> While adding the overhead of copying the list I would expect the results of
>
> timeit("f(l[:])", ...)
>
> to be more realistic.
>
> >     if __name__ == '__main__':
> >         main()
> >     $ ./test_sort_prefix.py sort_in_place 1000000 3
> >     33.260575089996564
> >     $ ./test_sort_prefix.py original_solution 1000000 3
> >     35.93009542999789
> >     $
> >
> >
> > in my tests there is no sensible difference between the two solutions...
> > and the number of matches is risible :)
>
> Indeed, as the number of matches rises your sort() approach will likely
> fare
> better.
>
> Your conclusions may be correct, but the benchmark to support them is
> flawed.
>
>
Option 1: scan and match, pop, insert
Scan and match is O(n)
pop is O(n) ... gotta shift everything over one after deleting
insert is O(n) on a list, better use deque.appendleft for O(1)

Option 2: map keyfunc, sort
Mapping the keyfunction is O(n)
(tim)Sorting is O(n log n) worst case, O(n) best case.

Option 1 using a deque will be O(n).
Option 2 will be O(n log n) worst case and O(n) best case.

On small datasets the constant factors matter, but then it's small, so who
cares about speed? (partially joking)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-01 16:27 -0400
  Re: Drowning in a teacup? Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-04-01 20:39 +0000
    Re: Drowning in a teacup? Ian Kelly <ian.g.kelly@gmail.com> - 2016-04-01 14:52 -0600
  Re: Drowning in a teacup? MRAB <python@mrabarnett.plus.com> - 2016-04-01 21:45 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-01 20:46 +0000
  Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-02 09:46 +1100
  Re: Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-02 00:03 -0400
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 07:45 +0200
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 05:51 +0000
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 08:26 +0200
  Re: Drowning in a teacup? Peter Otten <__peter__@web.de> - 2016-04-02 12:31 +0200
  Re: Drowning in a teacup? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-02 12:53 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 14:36 +0000
  Re: Drowning in a teacup? Ned Batchelder <ned@nedbatchelder.com> - 2016-04-02 12:28 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 15:54 -0400
      Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 12:33 +1000
    Re: Drowning in a teacup? Ethan Furman <ethan@stoneleaf.us> - 2016-04-02 16:15 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 19:19 -0400

csiph-web