Re: Drowning in a teacup?

From	Peter Otten <__peter__@web.de>
Newsgroups	comp.lang.python
Subject	Re: Drowning in a teacup?
Date	2016-04-02 12:31 +0200
Organization	None
Message-ID	<mailman.361.1459593100.28225.python-list@python.org> (permalink)
References	<ndmlj2$pb0$1@gioia.aioe.org> <ndnm9v$j07$1@ger.gmane.org> <CAGgTfkO4HuzoRBgrGJ3ccqcD+VB4_7O-g19NUk9FaYSBpQ2HkA@mail.gmail.com> <ndnomv$mf5$1@ger.gmane.org>

Show all headers | View raw

Vito De Tullio wrote:

> Michael Selik wrote:
> 
>>> > I need to scan a list of strings. If one of the elements matches the
>>> > beginning of a search keyword, that element needs to snap to the front
>>> > of the list.
>>>
>>> I know this post regards the function passing, but, on you specific
>>> problem,
>>> can't you just ... sort the list with a custom key?
>>
>> If the number of matches is small relative to the size of the list, I'd
>> expect the sort would be slower than most of the other suggestions.
> 
> umm... I take the liberty to set up a little test
> 
>     $ cat test_sort_prefix.py
>     #!/usr/bin/env python3
> 
>     from sys import argv
>     from timeit import timeit
> 
> 
>     def sort_in_place(l):
>         def key(e):
>             return not e.startswith('yes')
>         l.sort(key=key)
> 
> 
>     def original_solution(mylist):
>         for i in range(len(mylist)):
>             if(mylist[i].startswith('yes')):
>                 mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:]

original_solution() reverses the order of the matches while sort_in_place() 
doesn't. If the order is not relevant I'd try something like

def exchange(items):
    pos = 0
    for index, item in enumerate(items):
        if item.startswith("yes"):
            items[pos], items[index] = item, items[pos]
            pos += 1

>     def main():
>         if argv[1] == 'sort_in_place':
>             f = sort_in_place
>         elif argv[1] == 'original_solution':
>             f = original_solution
>         else:
>             raise Exception()
> 
>         nomatches = int(argv[2])
>         matches = int(argv[3])
> 
>         l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in
> range(matches)]

Python's timsort knows how to deal with (partially) sorted lists. I suggest 
that you add

random.seed(42) # same list for all script invocations
random.shuffle(l)

here to spread the matches somewhat over the list.

>         print(timeit("f(l)", number=100, globals={"f": f, "l": l}))

Remember that f(l) modifies l, so all but the first invocation will see a 
list where all matches are at the beginning. In other words: in 99 out of 
100 runs the list is already sorted.

While adding the overhead of copying the list I would expect the results of

timeit("f(l[:])", ...)

to be more realistic.

>     if __name__ == '__main__':
>         main()
>     $ ./test_sort_prefix.py sort_in_place 1000000 3
>     33.260575089996564
>     $ ./test_sort_prefix.py original_solution 1000000 3
>     35.93009542999789
>     $
> 
> 
> in my tests there is no sensible difference between the two solutions...
> and the number of matches is risible :)

Indeed, as the number of matches rises your sort() approach will likely fare 
better.

Your conclusions may be correct, but the benchmark to support them is 
flawed.

Thread

Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-01 16:27 -0400
  Re: Drowning in a teacup? Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-04-01 20:39 +0000
    Re: Drowning in a teacup? Ian Kelly <ian.g.kelly@gmail.com> - 2016-04-01 14:52 -0600
  Re: Drowning in a teacup? MRAB <python@mrabarnett.plus.com> - 2016-04-01 21:45 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-01 20:46 +0000
  Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-02 09:46 +1100
  Re: Drowning in a teacup? Fillmore <fillmore_remove@hotmail.com> - 2016-04-02 00:03 -0400
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 07:45 +0200
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 05:51 +0000
  Re: Drowning in a teacup? Vito De Tullio <vito.detullio@gmail.com> - 2016-04-02 08:26 +0200
  Re: Drowning in a teacup? Peter Otten <__peter__@web.de> - 2016-04-02 12:31 +0200
  Re: Drowning in a teacup? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-04-02 12:53 +0100
  Re: Drowning in a teacup? Michael Selik <michael.selik@gmail.com> - 2016-04-02 14:36 +0000
  Re: Drowning in a teacup? Ned Batchelder <ned@nedbatchelder.com> - 2016-04-02 12:28 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 15:54 -0400
      Re: Drowning in a teacup? Steven D'Aprano <steve@pearwood.info> - 2016-04-03 12:33 +1000
    Re: Drowning in a teacup? Ethan Furman <ethan@stoneleaf.us> - 2016-04-02 16:15 -0700
    Re: Drowning in a teacup? Random832 <random832@fastmail.com> - 2016-04-02 19:19 -0400

csiph-web