Re: Review Request of Python Code

From	Matt Wheeler <m@funkyhat.org>
Newsgroups	comp.lang.python
Subject	Re: Review Request of Python Code
Date	2016-03-09 12:33 +0000
Message-ID	<mailman.76.1457526806.15725.python-list@python.org> (permalink)
References	<f0973a0d-62ba-402b-ab23-cb68bdd15323@googlegroups.com> <CAG93HwGPBK0Mbu-N+mE6+MtbgLKpyRb2UoO6VZ9k_9yQN46cRw@mail.gmail.com>

Show all headers | View raw

On 9 March 2016 at 12:06, Matt Wheeler <m@funkyhat.org> wrote:
> But we can still do better. A list is a poor choice for this kind of
> lookup, as Python has no way to find elements other than by checking
> them one after another. (given (one of the) name(s) you've given it
> sounds a bit like "dictionary" I assume it contains rather a lot of
> items)

Sorry, I've just read your original code properly and see that you're
looking up the next item in the list, this means a set is not
suitable, as it doesn't preserve order (however, your original code is
open to an IndexError if the last element in your list is matched).

If you could provide a sample of the NewTotalTag.txt file data that
would be helpful, but working with the information I've got we can
still get a comparable speedup, by constructing a dict upfront mapping
each word to the next one[1]:

dict_word=dict_read.split()
dict_word.append('N/A')
# Assuming that 'N/A' is a reasonable output if the last word in your
list is matched.
# This works around the IndexError your current code is exposed to.
# The slice ([:-1]) means we don't try to add the last item to the new a4 dict.
a4={}
for index,word in enumerate(words[:-1]):
    a4[word] = dict_word[index+1]

This creates a dict where each key maps to the corresponding next
word, which you can use later in your lookup instead of fetching by
index. i.e. a4[word] instead of a4[windex+1].
This means you're saving yet *another* scan through of the entire list
(`a4.index(word)` has to scan yet again) for the positive matches.

[1] though I suspect if we get to see a sample of your data file there
may be a better way

-- 
Matt Wheeler
http://funkyh.at

Thread

Review Request of Python Code subhabangalore@gmail.com - 2016-03-08 20:18 -0800
  Re: Review Request of Python Code Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-03-09 16:10 +1100
  Re: Review Request of Python Code INADA Naoki <songofacandy@gmail.com> - 2016-03-09 16:52 +0900
  Re: Review Request of Python Code Friedrich Rentsch <anthra.norell@bluewin.ch> - 2016-03-09 10:06 +0100
  Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:06 +0000
  Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-09 12:33 +0000
  Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 10:12 -0800
    Re: Review Request of Python Code BartC <bc@freeuk.com> - 2016-03-10 18:36 +0000
    Re: Review Request of Python Code Matt Wheeler <m@funkyhat.org> - 2016-03-10 18:51 +0000
      Re: Review Request of Python Code subhabangalore@gmail.com - 2016-03-10 12:14 -0800
    RE: Review Request of Python Code Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-03-10 19:12 +0000
  Re: Review Request of Python Code Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-10 19:56 +0000

csiph-web