Re: Drowning in a teacup?

From	Michael Selik <michael.selik@gmail.com>
Newsgroups	comp.lang.python
Subject	Re: Drowning in a teacup?
Date	2016-04-02 14:36 +0000
Message-ID	<mailman.363.1459607817.28225.python-list@python.org> (permalink)
References	<ndmlj2$pb0$1@gioia.aioe.org> <ndnm9v$j07$1@ger.gmane.org> <CAGgTfkO4HuzoRBgrGJ3ccqcD+VB4_7O-g19NUk9FaYSBpQ2HkA@mail.gmail.com> <ndnomv$mf5$1@ger.gmane.org> <ndo722$rv4$1@ger.gmane.org>

Show all headers | View raw

On Sat, Apr 2, 2016 at 6:32 AM Peter Otten <__peter__@web.de> wrote:

> Vito De Tullio wrote:
>
> > Michael Selik wrote:
> >
> >>> > I need to scan a list of strings. If one of the elements matches the
> >>> > beginning of a search keyword, that element needs to snap to the
> front
> >>> > of the list.
> >>>
> >>> I know this post regards the function passing, but, on you specific
> >>> problem,
> >>> can't you just ... sort the list with a custom key?
> >>
> >> If the number of matches is small relative to the size of the list, I'd
> >> expect the sort would be slower than most of the other suggestions.
> >
> > umm... I take the liberty to set up a little test
> >
> >     $ cat test_sort_prefix.py
> >     #!/usr/bin/env python3
> >
> >     from sys import argv
> >     from timeit import timeit
> >
> >
> >     def sort_in_place(l):
> >         def key(e):
> >             return not e.startswith('yes')
> >         l.sort(key=key)
> >
> >
> >     def original_solution(mylist):
> >         for i in range(len(mylist)):
> >             if(mylist[i].startswith('yes')):
> >                 mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:]
>
> original_solution() reverses the order of the matches while sort_in_place()
> doesn't. If the order is not relevant I'd try something like
>
> def exchange(items):
>     pos = 0
>     for index, item in enumerate(items):
>         if item.startswith("yes"):
>             items[pos], items[index] = item, items[pos]
>             pos += 1
>
> >     def main():
> >         if argv[1] == 'sort_in_place':
> >             f = sort_in_place
> >         elif argv[1] == 'original_solution':
> >             f = original_solution
> >         else:
> >             raise Exception()
> >
> >         nomatches = int(argv[2])
> >         matches = int(argv[3])
> >
> >         l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in
> > range(matches)]
>
> Python's timsort knows how to deal with (partially) sorted lists. I suggest
> that you add
>
> random.seed(42) # same list for all script invocations
> random.shuffle(l)
>
> here to spread the matches somewhat over the list.
>
> >         print(timeit("f(l)", number=100, globals={"f": f, "l": l}))
>
> Remember that f(l) modifies l, so all but the first invocation will see a
> list where all matches are at the beginning. In other words: in 99 out of
> 100 runs the list is already sorted.
>
> While adding the overhead of copying the list I would expect the results of
>
> timeit("f(l[:])", ...)
>
> to be more realistic.
>
> >     if __name__ == '__main__':
> >         main()
> >     $ ./test_sort_prefix.py sort_in_place 1000000 3
> >     33.260575089996564
> >     $ ./test_sort_prefix.py original_solution 1000000 3
> >     35.93009542999789
> >     $
> >
> >
> > in my tests there is no sensible difference between the two solutions...
> > and the number of matches is risible :)
>
> Indeed, as the number of matches rises your sort() approach will likely
> fare
> better.
>
> Your conclusions may be correct, but the benchmark to support them is
> flawed.
>
>
Option 1: scan and match, pop, insert
Scan and match is O(n)
pop is O(n) ... gotta shift everything over one after deleting
insert is O(n) on a list, better use deque.appendleft for O(1)

Option 2: map keyfunc, sort
Mapping the keyfunction is O(n)
(tim)Sorting is O(n log n) worst case, O(n) best case.

Option 1 using a deque will be O(n).
Option 2 will be O(n log n) worst case and O(n) best case.

On small datasets the constant factors matter, but then it's small, so who
cares about speed? (partially joking)

Thread

Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-01 16:27 -0400
  Re: Drowning in a teacup? Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-04-01 20:39 +0000
    Re: Drowning in a teacup? Ian Kelly <ian.g.kelly@gmail.com> - 2016-04-01 14:52 -0600
  Re: Drowning in a teacup? MRAB <python@mrabarnett.plus.com> - 2016-04-01 21:45 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-01 20:46 +0000
  Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-02 09:46 +1100
  Re: Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-02 00:03 -0400
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 07:45 +0200
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 05:51 +0000
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 08:26 +0200
  Re: Drowning in a teacup? Peter Otten <__peter__@web.de> - 2016-04-02 12:31 +0200
  Re: Drowning in a teacup? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-02 12:53 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 14:36 +0000
  Re: Drowning in a teacup? Ned Batchelder <ned@nedbatchelder.com> - 2016-04-02 12:28 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 15:54 -0400
      Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 12:33 +1000
    Re: Drowning in a teacup? Ethan Furman <ethan@stoneleaf.us> - 2016-04-02 16:15 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 19:19 -0400

csiph-web