Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Matt Wheeler Newsgroups: comp.lang.python Subject: Re: Review Request of Python Code Date: Thu, 10 Mar 2016 18:51:46 +0000 Lines: 44 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de ly52fSyYKiOrr+ckycsy0A9GqWpsN12e43rzLZN3FjMQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.035 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'subject:Python': 0.05; 'context': 0.05; 'default.': 0.07; 'assume': 0.11; 'output': 0.13; '#empty': 0.16; '2016': 0.16; 'input.': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sequential': 0.16; 'tags.': 0.16; 'wrote:': 0.16; 'skip:l 30': 0.18; 'input': 0.18; 'to:2**1': 0.21; 'so.': 0.22; "aren't": 0.22; 'subject:Code': 0.22; 'split': 0.23; 'words': 0.24; 'header:In-Reply-To:1': 0.24; 'point.': 0.27; 'separate': 0.27; 'message-id:@mail.gmail.com': 0.27; 'function': 0.28; 'initial': 0.28; 'dictionary': 0.29; 'sensible': 0.29; 'whitespace': 0.29; "i'm": 0.30; 'that.': 0.30; 'code': 0.30; 'e.g.': 0.30; 'entry': 0.31; 'point': 0.33; 'open': 0.33; 'file': 0.34; 'skip:d 20': 0.34; 'list': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'text': 0.35; 'something': 0.35; 'should': 0.36; 'instead': 0.36; 'lines': 0.36; 'received:209.85': 0.36; 'closing': 0.36; 'loaded': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:209': 0.38; 'thank': 0.38; 'data': 0.39; 'along': 0.39; 'to:addr:python.org': 0.40; 'easy': 0.60; 'your': 0.60; "you'll": 0.61; 'skip:u 10': 0.61; 'skip:n 10': 0.62; 'march': 0.64; 'brain': 0.66; 'of:': 0.66; 'choose': 0.68; 'eyes': 0.70; 'tags,': 0.79; '(look': 0.84; 'from:addr:m': 0.84; 'suggestion,': 0.84; 'suspected': 0.84; ':).': 0.91 X-Virus-Scanned: Debian amavisd-new at membrane.funkyhat.net X-Gm-Message-State: AD7BkJKs2EX/r6ctZ+V6FA0TBTqhhoRGGuXtdPAkkEfNeqctIX1tiZsiQn10OTpwriI3eRqNjPkjzhp7DImZaw== X-Received: by 10.25.143.141 with SMTP id r135mr2026320lfd.100.1457635925912; Thu, 10 Mar 2016 10:52:05 -0800 (PST) In-Reply-To: X-Gmail-Original-Message-ID: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104541 On 10 March 2016 at 18:12, wrote: > Matt, thank you for if...else suggestion, the data of NewTotalTag.txt > is like a simple list of words with unconventional tags, like, > > w1 tag1 > w2 tag2 > w3 tag3 > ... > ... > w3 tag3 > > like that. I suspected so. The way your code currently works, if your input text contains one of the tags, e.g. 'tag1' you'll get an entry in your output something like 'tag1/w2'. I assume you don't want that :). This is because you're using a single list to include all of the tags. Try something along the lines of: dict_word={} #empty dictionary for line in dict_read.splitlines(): word, tag = line.split(' ') dict_word[word] = tag Notice I'm using splitlines() instead of split() to do the initial chopping up of your input. split() will split on any whitespace by default. splitlines should be self-explanatory. I would split this and the file-open out into a separate function at this point. Large blobs of sequential code are not particularly easy on the eyes or the brain -- choose a sensible name, like load_dictionary. Perhaps something you could call like: dict_word = load_dictionary("NewTotalTag.txt") You also aren't closing the file that you open at any point -- once you've loaded the data from it there's no need to keep the file opened (look up context managers). -- Matt Wheeler http://funkyh.at