Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #106286
| Path | csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Michael Selik <michael.selik@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: Drowning in a teacup? |
| Date | Sat, 02 Apr 2016 14:36:45 +0000 |
| Lines | 116 |
| Message-ID | <mailman.363.1459607817.28225.python-list@python.org> (permalink) |
| References | <ndmlj2$pb0$1@gioia.aioe.org> <ndnm9v$j07$1@ger.gmane.org> <CAGgTfkO4HuzoRBgrGJ3ccqcD+VB4_7O-g19NUk9FaYSBpQ2HkA@mail.gmail.com> <ndnomv$mf5$1@ger.gmane.org> <ndo722$rv4$1@ger.gmane.org> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| X-Trace | news.uni-berlin.de dG3aCI9ZCqJGLgekPIvutAVrMxYGpE5xHAF+wfZlgE1w== |
| Return-Path | <michael.selik@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'pop': 0.05; 'python3': 0.05; 'sys': 0.05; '__name__': 0.07; 'main()': 0.07; 'matches': 0.07; 'strings.': 0.07; 'indeed,': 0.09; 'invocation': 0.09; 'item,': 0.09; 'modifies': 0.09; 'suggestions.': 0.09; 'def': 0.13; 'suggest': 0.15; '>>>': 0.15; "'__main__':": 0.16; '...)': 0.16; '2016': 0.16; 'beginning.': 0.16; 'cares': 0.16; 'datasets': 0.16; 'deque': 0.16; 'invocations': 0.16; 'key?': 0.16; 'keyword,': 0.16; 'main():': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reverses': 0.16; 'sort()': 0.16; 'to:addr:web.de': 0.16; 'worst': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.16; 'case.': 0.18; 'element': 0.18; '>': 0.18; 'tests': 0.18; 'runs': 0.18; '>>>': 0.20; 'to:2**1': 0.21; 'constant': 0.22; 'pos': 0.22; 'elements': 0.23; 'insert': 0.23; "python's": 0.23; 'sat,': 0.23; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'sort': 0.25; 'script': 0.25; 'skip:# 10': 0.27; 'message-id:@mail.gmail.com': 0.27; 'small,': 0.27; 'function': 0.28; "skip:' 10": 0.28; 'cat': 0.29; 'correct,': 0.29; 'index,': 0.29; 'sensible': 0.29; 'raise': 0.29; 'relative': 0.30; 'skip:[ 10': 0.31; "i'd": 0.31; 'option': 0.31; 'post': 0.31; "can't": 0.32; 'knows': 0.32; 'michael': 0.33; 'shift': 0.33; 'case,': 0.34; "skip:' 20": 0.34; 'add': 0.34; 'list': 0.34; 'received:google.com': 0.35; 'skip:3 10': 0.35; 'lists.': 0.35; 'mapping': 0.35; 'skip:. 20': 0.35; 'something': 0.35; 'item': 0.35; 'but': 0.36; 'list,': 0.36; 'skip:i 20': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'beginning': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'skip:& 10': 0.37; 'expect': 0.37; 'spread': 0.37; 'list.': 0.37; 'difference': 0.38; 'front': 0.38; 'received:209': 0.38; 'log': 0.38; 'skip:p 20': 0.38; 'copying': 0.38; 'skip:o 20': 0.38; 'test': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'your': 0.60; 'more': 0.63; 'between': 0.65; 'better.': 0.66; 'here': 0.66; 'results': 0.66; 'skip:\xc2 10': 0.67; '100': 0.79; 'benchmark': 0.84; 'conclusions': 0.84; 'liberty': 0.84; 'otten': 0.84; 'rises': 0.84; 'snap': 0.84; 'fare': 0.93; 'factors': 0.97 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=gMGbumhzE+TKewnVr+7lV0wA77ic1lxpvZJGrvwdHAQWnj6X12qDgwaaOyrnlooVH6 g7sPWKjypYD93h/mWGovq2fAN9xaTlfhCMlMQvvYhHL17/84BnQFCh6Fl5bZBclknYrx 6pCIHcXI3U33DC7h2y035Pak1BToglwFHyfylEfcJw1pEtPQRuk2jv5IY9jNZ5fwT+Yw EUlH1P5p0hwjCG6Ujxc8guT8Ggsew7UlJTJFCk+X2bJnbHbBvryC7trKD1JZIM+E/N0Y QKnH9EHVFUMeCDEWgEh73YARMgzOPz5aCW9kFB9NxPGZuBo35yf9jYsCs2T1VU1lgFco NZuQ== |
| X-Google-DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=46lafslIB6NY4p8bG2vth9IKhvPtXxr8yHMk/2AKrjw=; b=DuI/sRvPtf3lNDsIV+3DrJy1Azf2g8vorp0rOHaenuf0XFgWcVdMxh30WUr0J6cF/d M5pVKLJMVveLt9uN5M2hKObTiShZ5RUVvh6YzGuSgk9YjWKkpWPCKe0FvcNuHdlIZufk hh3D39ANgLptTJ81HKb2fVHinmKhPjuJuqAkrsb9iQnc354Iu24NclE8P+a/nEB5zasd tjBx2f0Y8VRNW6MAzj5KZvyWGdkxQg3yYatajae2+ZB20FkuIgDD2BC5WEyDMKbJhaEc JAzvI6uPWykCA/CeCOG07leRRpjjXfY1fg5oRi7pIqCi+jz0GBRtEVbugfYOeov3oFBc xnEg== |
| X-Gm-Message-State | AD7BkJJeKN7G4KrrVBL1hbfUcPicpGC/tpavwWeV4nMz6B8T3kDM2dzgZS2qtqWQwkgBafVE4iePaw5YTfIuwg== |
| X-Received | by 10.140.237.204 with SMTP id i195mr13843006qhc.55.1459607814898; Sat, 02 Apr 2016 07:36:54 -0700 (PDT) |
| In-Reply-To | <ndo722$rv4$1@ger.gmane.org> |
| X-Content-Filtered-By | Mailman/MimeDel 2.1.21 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:106286 |
Show key headers only | View raw
On Sat, Apr 2, 2016 at 6:32 AM Peter Otten <__peter__@web.de> wrote:
> Vito De Tullio wrote:
>
> > Michael Selik wrote:
> >
> >>> > I need to scan a list of strings. If one of the elements matches the
> >>> > beginning of a search keyword, that element needs to snap to the
> front
> >>> > of the list.
> >>>
> >>> I know this post regards the function passing, but, on you specific
> >>> problem,
> >>> can't you just ... sort the list with a custom key?
> >>
> >> If the number of matches is small relative to the size of the list, I'd
> >> expect the sort would be slower than most of the other suggestions.
> >
> > umm... I take the liberty to set up a little test
> >
> > $ cat test_sort_prefix.py
> > #!/usr/bin/env python3
> >
> > from sys import argv
> > from timeit import timeit
> >
> >
> > def sort_in_place(l):
> > def key(e):
> > return not e.startswith('yes')
> > l.sort(key=key)
> >
> >
> > def original_solution(mylist):
> > for i in range(len(mylist)):
> > if(mylist[i].startswith('yes')):
> > mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:]
>
> original_solution() reverses the order of the matches while sort_in_place()
> doesn't. If the order is not relevant I'd try something like
>
> def exchange(items):
> pos = 0
> for index, item in enumerate(items):
> if item.startswith("yes"):
> items[pos], items[index] = item, items[pos]
> pos += 1
>
> > def main():
> > if argv[1] == 'sort_in_place':
> > f = sort_in_place
> > elif argv[1] == 'original_solution':
> > f = original_solution
> > else:
> > raise Exception()
> >
> > nomatches = int(argv[2])
> > matches = int(argv[3])
> >
> > l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in
> > range(matches)]
>
> Python's timsort knows how to deal with (partially) sorted lists. I suggest
> that you add
>
> random.seed(42) # same list for all script invocations
> random.shuffle(l)
>
> here to spread the matches somewhat over the list.
>
> > print(timeit("f(l)", number=100, globals={"f": f, "l": l}))
>
> Remember that f(l) modifies l, so all but the first invocation will see a
> list where all matches are at the beginning. In other words: in 99 out of
> 100 runs the list is already sorted.
>
> While adding the overhead of copying the list I would expect the results of
>
> timeit("f(l[:])", ...)
>
> to be more realistic.
>
> > if __name__ == '__main__':
> > main()
> > $ ./test_sort_prefix.py sort_in_place 1000000 3
> > 33.260575089996564
> > $ ./test_sort_prefix.py original_solution 1000000 3
> > 35.93009542999789
> > $
> >
> >
> > in my tests there is no sensible difference between the two solutions...
> > and the number of matches is risible :)
>
> Indeed, as the number of matches rises your sort() approach will likely
> fare
> better.
>
> Your conclusions may be correct, but the benchmark to support them is
> flawed.
>
>
Option 1: scan and match, pop, insert
Scan and match is O(n)
pop is O(n) ... gotta shift everything over one after deleting
insert is O(n) on a list, better use deque.appendleft for O(1)
Option 2: map keyfunc, sort
Mapping the keyfunction is O(n)
(tim)Sorting is O(n log n) worst case, O(n) best case.
Option 1 using a deque will be O(n).
Option 2 will be O(n log n) worst case and O(n) best case.
On small datasets the constant factors matter, but then it's small, so who
cares about speed? (partially joking)
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-01 16:27 -0400
Re: Drowning in a teacup? Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-04-01 20:39 +0000
Re: Drowning in a teacup? Ian Kelly <ian.g.kelly@gmail.com> - 2016-04-01 14:52 -0600
Re: Drowning in a teacup? MRAB <python@mrabarnett.plus.com> - 2016-04-01 21:45 +0100
Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-01 20:46 +0000
Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-02 09:46 +1100
Re: Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-02 00:03 -0400
Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 07:45 +0200
Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 05:51 +0000
Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 08:26 +0200
Re: Drowning in a teacup? Peter Otten <__peter__@web.de> - 2016-04-02 12:31 +0200
Re: Drowning in a teacup? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-02 12:53 +0100
Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 14:36 +0000
Re: Drowning in a teacup? Ned Batchelder <ned@nedbatchelder.com> - 2016-04-02 12:28 -0700
Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 15:54 -0400
Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 12:33 +1000
Re: Drowning in a teacup? Ethan Furman <ethan@stoneleaf.us> - 2016-04-02 16:15 -0700
Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 19:19 -0400
csiph-web