Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: MRAB Newsgroups: comp.lang.python Subject: Re: A fun python CLI program for all to enjoy! Date: Fri, 6 May 2016 21:30:17 +0100 Lines: 89 Message-ID: References: <7e52b918-2087-f93f-43cb-3411c1cdc881@mrabarnett.plus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de v1FQuSQEaS08s2C7svEQiAvuYGsm+v/XPjqmSeVUKWmQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'f.close()': 0.07; 'formatting': 0.07; 'method,': 0.07; 'welcome.': 0.07; '"w")': 0.09; 'addr': 0.09; 'files:': 0.09; 'sql,': 0.09; 'sqlite': 0.09; 'bug': 0.10; 'itself.': 0.11; ':-)': 0.12; 'def': 0.13; 'subject:python': 0.14; 'skip:f 30': 0.15; '"%s': 0.16; '"%s",': 0.16; '.join': 0.16; 'dfs': 0.16; 'false:': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'labels,': 0.16; 'message- id:@mrabarnett.plus.com': 0.16; 'received:192.168.1.4': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'set()': 0.16; 'slicing:': 0.16; 'subject:program': 0.16; 'wrote:': 0.16; 'string': 0.17; 'directory.': 0.18; "shouldn't": 0.18; 'typing': 0.18; 'creates': 0.18; '%s"': 0.22; "aren't": 0.22; 'button,': 0.22; 'code,': 0.23; 'wrote': 0.23; 'insert': 0.23; 'header:In-Reply-To:1': 0.24; 'header:User-Agent:1': 0.26; "doesn't": 0.26; 'example': 0.26; 'compare': 0.27; 'handling': 0.27; 'values': 0.28; 'quoting': 0.29; 'asked': 0.29; 'print': 0.30; 'too.': 0.30; 'writes': 0.30; 'another': 0.32; 'addresses': 0.32; 'received:84': 0.32; 'statement': 0.32; 'subject:all': 0.32; 'shorter': 0.33; 'recommended': 0.34; 'structure': 0.34; 'file': 0.34; 'text': 0.35; 'fail': 0.35; 'false': 0.35; 'list:': 0.35; 'reports,': 0.35; 'should': 0.36; 'instead': 0.36; 'closing': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'there,': 0.37; 'one,': 0.37; 'mailing': 0.38; 'files': 0.38; 'received:192': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'some': 0.40; 'your': 0.60; 'address': 0.61; 'flat': 0.63; 'reviews': 0.63; 'more': 0.63; 'within': 0.64; 'food': 0.64; 'state,': 0.66; 'statement,': 0.66; 'note:': 0.66; 'drivers': 0.72; "'with'": 0.84; '(actually,': 0.84; 'attacks.': 0.84; 'bells': 0.84; 'clearer': 0.84; 'yellow': 0.84; 'items,': 0.91 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.1 cv=bsGxfxui c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=IkcTkHD0fZMA:10 a=7omfUIMRAAAA:8 a=JnCTnJulAgAKIrJ1GO8A:9 a=pE2rhJmjfyaa-DSV:21 a=GUzkzjFrxbZ5nEZE:21 a=QEXdDO2ut3YA:10 a=flrlzg-G3nFfujHlp1au:22 X-AUTH: mrabarnett@:2500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <7e52b918-2087-f93f-43cb-3411c1cdc881@mrabarnett.plus.com> X-Mailman-Original-References: Xref: csiph.com comp.lang.python:108237 On 2016-05-06 20:10, DFS wrote: > getAddresses.py > > Scrapes addresses from www.usdirectory.com and stores them in a SQLite > database, or writes them to text files for mailing labels, etc > > Now, just by typing 'fast food Taco Bell 10 db all' you can find > out how many Taco Bells are within 10 miles of you, and store all the > addresses in your own address database. > > No more convoluted Googling, or hitting the 'Next Page' button, or > fumbling with the Yellow Pages... > > Note: the db structure is flat on purpose, and the .csv files aren't > quote delimited. > > Put the program in its own directory. It creates the SQLite database > there, and writes files there, too. > > Reviews of code, bug reports, criticisms, suggestions for improvement, > etc are all welcome. > OK, you asked for it... :-) 1. It's shorter and clearer not to compare with True or False: if verbose: and: if not dupeRow: 2. You can print a blank line with an empty print statement: print 3. When looking for unique items, a set is a better choice than a list: addrCheck = set() def addrUnique(addr): if addr not in addrCheck: x = True addrCheck.add(addr) else: x = False return x 4. Try string formatting instead multiple concatenation: print "%s arguments" % argCnt 5. Strings have a .join method, and when you combine it with string slicing: keyw = "+".join(sys.argv[1 : argCnt - 5]) 6. Another example of string formatting: search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant) 7. It's recommended to use the 'with' statement when handling files: with open(webfile, "w") as f: if store == "csv": f.write("Name,Address,CityStateZip\n") If you don't want to use the 'with' statement, note that closing the file is: f.close() It needs the "()"! 8. When using SQL, you shouldn't try to insert the values yourself; you should use parametrised queries: cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;" if verbose: print cSQL db.execute(cSQL, (datasrc, search)) conn.commit() It'll insert the values where the "?" are and will do any necessary quoting itself. (Actually, some drivers use "?", others use "%s", so if it doesn't work with one, try the other.) The way you wrote it, it would fail if a value contained a "'". It's that kind of thing that leads to SQL injection attacks.