Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #108236 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2016-05-06 15:10 -0400 |
| Last post | 2016-05-07 15:59 +0200 |
| Articles | 10 — 7 participants |
Back to article view | Back to comp.lang.python
A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 15:10 -0400
Re: A fun python CLI program for all to enjoy! MRAB <python@mrabarnett.plus.com> - 2016-05-06 21:30 +0100
Re: A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 19:12 -0400
Re: A fun python CLI program for all to enjoy! Ethan Furman <ethan@stoneleaf.us> - 2016-05-06 16:29 -0700
Re: A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 19:58 -0400
Re: A fun python CLI program for all to enjoy! MRAB <python@mrabarnett.plus.com> - 2016-05-07 01:38 +0100
Re: A fun python CLI program for all to enjoy! Stephen Hansen <me+python@ixokai.io> - 2016-05-06 23:03 -0700
Re: A fun python CLI program for all to enjoy! Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-07 18:24 +1200
Re: A fun python CLI program for all to enjoy! alister <alister.ware@ntlworld.com> - 2016-05-07 08:51 +0000
Re: A fun python CLI program for all to enjoy! Peter Otten <__peter__@web.de> - 2016-05-07 15:59 +0200
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-06 15:10 -0400 |
| Subject | A fun python CLI program for all to enjoy! |
| Message-ID | <ngiq0u$brn$1@dont-email.me> |
getAddresses.py
Scrapes addresses from www.usdirectory.com and stores them in a SQLite
database, or writes them to text files for mailing labels, etc
Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
out how many Taco Bells are within 10 miles of you, and store all the
addresses in your own address database.
No more convoluted Googling, or hitting the 'Next Page' button, or
fumbling with the Yellow Pages...
Note: the db structure is flat on purpose, and the .csv files aren't
quote delimited.
Put the program in its own directory. It creates the SQLite database
there, and writes files there, too.
Reviews of code, bug reports, criticisms, suggestions for improvement,
etc are all welcome.
Enjoy!
========================================================================
#getAddresses.py
import os, sys, requests, time, datetime
from lxml import html
import pyodbc, sqlite3, re
#show values of variables, HTML content, etc
#set it to False for short/concise program output
verbose = False
if verbose == True:
print "The verbose setting is turned On."
print ""
#check if address is unique
addrCheck = []
def addrUnique(addr):
if addr not in addrCheck:
x = True
addrCheck.append(addr)
else: x = False
return x
#validate and parse command line
def showHelp():
print ""
print " Enter search word(s), city or zip, state, miles to search, txt
or csv or db, # addresses to save (no commas)"
print ""
print " eg: restaurant Knoxville TN 10 txt 50"
print " search for restaurants within 10 miles of Knoxville TN, and
write"
print " the first 50 address to a txt file"
print ""
print " eg: furniture 30303 GA 20 csv all"
print " search for furniture within 20 miles of zip 30303 GA,"
print " and write all results to a csv file"
print ""
print " eg: boxing gyms Detroit MI 10 db 5"
print " search for boxing gyms within 10 miles of Detroit MI, and
store"
print " the first 5 results in a database"
print ""
print " All entries are case-insensitive (ie TX or tx are acceptable)"
exit(0)
argCnt = len(sys.argv)
if argCnt < 7: showHelp()
if verbose == True:
print ""
print str(argCnt) + " arguments"
keyw = "" #eg restaurant, boxing gym
if argCnt == 7: keyw = sys.argv[1] #one search word
if argCnt > 7: #multiple search words
for i in range(1,argCnt-5):
keyw = keyw + sys.argv[i] + "+"
keyw = keyw[:-1] #drop trailing + sign
cityzip = sys.argv[argCnt-5] #eg Atlanta or 30339
state = sys.argv[argCnt-4] #eg GA
miles = sys.argv[argCnt-3] #eg 5,10,20,30,50 (website allows max 30)
store = sys.argv[argCnt-2] #write address to file or database
addrWant = sys.argv[argCnt-1] #eg save All or number >0
if addrWant.lower() != "all": #how many addresses to save
if addrWant.isdigit() == False: showHelp()
if addrWant == "0" : showHelp()
addrWant = int(addrWant)
elif addrWant.lower() == "all": addrWant = addrWant.lower()
else: addrWant = int(addrWant)
if store != "csv" and store != "txt" and store != "db": showHelp()
#begin timing the code
startTime = time.clock()
#website, SQLite db, search string, current date/time for use with db
datasrc = "www.usdirectory.com"
dbName = "addresses.sqlite"
search = keyw + " " + str(cityzip) + " " + state + " " + str(miles) + "
" + str(addrWant)
loaddt = datetime.datetime.now()
#write addresses to file
#each time the same search is done, the file is deleted and recreated
if store == "csv" or store == "txt":
#csv will write in .csv format - header and 1 line per address
#txt will write out 3 lines per address, then blank before next address
webfile = "usdirectory.com_"+keyw+"_"+cityzip+"_"+state+"."+store
f = open(webfile,"w")
if store == "csv": f.write("Name,Address,CityStateZip\n")
f.close
#store addresses in database
cSQL = ""
if store == "db":
#creates a SQLite database that Access 2003 can't read
#conn = sqlite3.connect(dbName)
#also creates a SQLite database that Access 2003 can't read
conn = pyodbc.connect('Driver={SQLite3 ODBC Driver};Database=' + dbName)
db = conn.cursor()
cSQL = "CREATE TABLE If Not Exists ADDRESSES "
cSQL += "(datasrc, search, category, name, street, city, state, zip,
loaddt, "
cSQL += "PRIMARY KEY (datasrc, search, name, street));"
db.execute(cSQL)
# cSQL = "CREATE TABLE If Not Exists CATEGORIES "
# cSQL += "(catID INTEGER PRIMARY KEY, catDesc);"
# db.execute(cSQL)
# db.execute("CREATE UNIQUE INDEX If Not Exists UIDX_CATDESC ON
CATEGORIES (catDesc);")
conn.commit()
if verbose == True:
print("connected to database: " + dbName)
print cSQL
print("created table: addresses")
print("")
if verbose == True:
print "Search summary"
print "------------------------------"
print "Keywords: " + keyw
print "City/Zip: " + cityzip
print "State : " + state
print "Radius : " + str(miles) + " miles"
print "Save : " + str(addrWant) + " addresses to " + store
print "------------------------------"
print ""
#build url
wBase = "http://www.usdirectory.com"
wForm = "/ypr.aspx?fromform=qsearch"
wKeyw = "&qhqn=" + keyw
wCityZip = "&qc=" + cityzip
wState = "&qs=" + state
wDist = "&rg=" + str(miles)
wSort = "&sb=a2z" #sort alpha
wPage = "&ap=" #used with the results page number
webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist
if verbose == True:
print "Search url: \n" + webpage
print ""
#delete previous results of identical search
if store == "db":
cSQL = "DELETE FROM addresses "
cSQL += "WHERE datasrc = '" + datasrc + "' "
cSQL += "AND search = '" + search + "';"
if verbose == True: print cSQL
db.execute(cSQL)
conn.commit()
#query web server, save results
print "searching..."
i = 0
dupes = 0
addrReturned = 0
addrSaved = 0
while 1:
wPageNbr = wPage + str(i+1)
webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist + wSort +
wPageNbr
page = requests.get(webpage)
tree = html.fromstring(page.content)
#no matches
matches = tree.xpath('//strong/text()')
if i == 0 and "No results were found" in str(matches):
print "No results found for that search"
exit(0)
os.remove(webfile)
#parse number of addresses returned
#some searches return 2 items: ['Found N results', 'junk']
#some searches return 3 items: ['Filter this search','Found N
results','junk']
if i == 0:
match = tree.xpath('//div[@class="header_text"]/text()')
if len(match) > 2: match.pop(0) #remove first element if 2
match = [int(s) for s in match[0].split() if s.isdigit()]
addrFound = match[0]
if addrWant != "all": addrWant = min(addrWant,addrFound)
print str(addrFound) + " matches found (" + str(addrWant) + " will be
saved)"
print ""
#split names, addresses into lists
nms = tree.xpath('//span[@class="header_text3"]/text()')
if len(nms) == 0: break
addr = tree.xpath('//span[@class="text3"]/text()')
addr = [t.replace("\r\n", "") for t in addr]
addr = filter(None, (t.strip() for t in addr))
street = [s.split(',')[0] for s in addr]
city = [c.split(',')[1].strip() for c in addr]
state = [s[-8:][:2] for s in addr]
zip = [z[-5:] for z in addr]
#get usdirectory.com categories
category = tree.xpath('//a/text()')
category = [c.strip() for c in category]
category = filter(None, category)
pattern = re.compile(r"^[A-Z\s&,-]+$")
category = [x for x in category if pattern.match(x)]
#screen feedback
print "retrieving page " + str(i+1) + ": " + str(len(nms)) + " addresses"
if verbose == True:
print ""
print "Names: \n" + str(nms)
print ""
print "Addresses: \n" + str(addr)
print ""
print "Categories: \n" + str(category)
print
"----------------------------------------------------------------------------------------"
print ""
#data integrity check - make sure all lists have same # of items
lenData =
[len(category),len(nms),len(addr),len(street),len(city),len(state),len(zip)]
if len(set(lenData)) != 1:
print "Data parsing issue. One or more lists has an incorrect number
of items. Program will exit."
exit(0)
if verbose == True:
if len(set(lenData)) == 1: print "Verified: each list has " +
str(len(nms)) + " items in it."
#write addresses to file
if store == "txt" or store == "csv":
addrList = []
for j in range(len(nms)):
if addrUnique(nms[j]+' '+street[j]) == True:
if store == "txt":
addrList.append(nms[j]+'\n'+street[j]+'\n'+city[j]+', '+state[j]+'
'+zip[j]+'\n\n')
if store == "csv": addrList.append(nms[j]+',' +street[j]+','
+city[j]+' ' +state[j]+' '+zip[j]+'\n')
addrSaved += 1
else:
dupes += 1
print " * duplicate address found: " + nms[j] + ", " + street[j]
addrReturned += 1
if addrWant != "all":
if addrSaved >= addrWant: break
f = open(webfile,"a")
for address in addrList: f.write(address)
f.close
#write addresses to database
if store == "db":
for j in range(len(nms)):
dupeRow = False
cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?,?,?,?,?)"
#(datasrc,search,category,name,street,city,state,zip,loaddt) "
Vals =
datasrc,search,category[j],nms[j],street[j],city[j],state[j],zip[j],str(loaddt)
if verbose == True: print cSQL + ',' + str(Vals)
try: db.execute(cSQL, Vals)
except (pyodbc.Error) as programError:
if str(programError).find("UNIQUE constraint failed") > 0:
dupeRow = True
dupes +=1
print " * duplicate address found: " + nms[j] + ", " + street[j]
pass
addrReturned += 1
if dupeRow == False:
addrSaved += 1
if addrWant != "all":
if addrSaved >= addrWant: break
conn.commit()
if addrSaved >= addrFound or addrSaved >= addrWant: break
i += 1
time.sleep(2)
#finish
if (store == "csv" or store == "txt"):
print "\nFinished\nWrote " + str(addrSaved) + " addresses to file " +
webfile
elif store == "db":
db.close()
conn.close()
print "\nFinished\nStored " + str(addrSaved) + " addresses in database:
" + dbName
if dupes > 0: print "(" + str(dupes) + " duplicate addresses ignored)"
#timer
endTime = time.clock()
print "processing time: %.2g seconds" %(endTime-startTime)
#bug in www.usdirectory.com code: usually overreports matches by 1
if (addrWant == "all") and (addrReturned != addrFound):
print "Note: " + datasrc + " reported " + str(addrFound) + " matches,
but returned " + str(addrReturned)
========================================================================
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-05-06 21:30 +0100 |
| Message-ID | <mailman.436.1462566630.32212.python-list@python.org> |
| In reply to | #108236 |
On 2016-05-06 20:10, DFS wrote:
> getAddresses.py
>
> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
> database, or writes them to text files for mailing labels, etc
>
> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
> out how many Taco Bells are within 10 miles of you, and store all the
> addresses in your own address database.
>
> No more convoluted Googling, or hitting the 'Next Page' button, or
> fumbling with the Yellow Pages...
>
> Note: the db structure is flat on purpose, and the .csv files aren't
> quote delimited.
>
> Put the program in its own directory. It creates the SQLite database
> there, and writes files there, too.
>
> Reviews of code, bug reports, criticisms, suggestions for improvement,
> etc are all welcome.
>
OK, you asked for it... :-)
1. It's shorter and clearer not to compare with True or False:
if verbose:
and:
if not dupeRow:
2. You can print a blank line with an empty print statement:
print
3. When looking for unique items, a set is a better choice than a list:
addrCheck = set()
def addrUnique(addr):
if addr not in addrCheck:
x = True
addrCheck.add(addr)
else:
x = False
return x
4. Try string formatting instead multiple concatenation:
print "%s arguments" % argCnt
5. Strings have a .join method, and when you combine it with string slicing:
keyw = "+".join(sys.argv[1 : argCnt - 5])
6. Another example of string formatting:
search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)
7. It's recommended to use the 'with' statement when handling files:
with open(webfile, "w") as f:
if store == "csv":
f.write("Name,Address,CityStateZip\n")
If you don't want to use the 'with' statement, note that closing the
file is:
f.close()
It needs the "()"!
8. When using SQL, you shouldn't try to insert the values yourself; you
should use parametrised queries:
cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
if verbose:
print cSQL
db.execute(cSQL, (datasrc, search))
conn.commit()
It'll insert the values where the "?" are and will do any necessary
quoting itself. (Actually, some drivers use "?", others use "%s", so if
it doesn't work with one, try the other.)
The way you wrote it, it would fail if a value contained a "'".
It's that kind of thing that leads to SQL injection attacks.
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-06 19:12 -0400 |
| Message-ID | <ngj86q$ia$1@dont-email.me> |
| In reply to | #108237 |
On 5/6/2016 4:30 PM, MRAB wrote:
> On 2016-05-06 20:10, DFS wrote:
>> getAddresses.py
>>
>> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
>> database, or writes them to text files for mailing labels, etc
>>
>> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
>> out how many Taco Bells are within 10 miles of you, and store all the
>> addresses in your own address database.
>>
>> No more convoluted Googling, or hitting the 'Next Page' button, or
>> fumbling with the Yellow Pages...
>>
>> Note: the db structure is flat on purpose, and the .csv files aren't
>> quote delimited.
>>
>> Put the program in its own directory. It creates the SQLite database
>> there, and writes files there, too.
>>
>> Reviews of code, bug reports, criticisms, suggestions for improvement,
>> etc are all welcome.
>>
> OK, you asked for it... :-)
>
> 1. It's shorter and clearer not to compare with True or False:
>
> if verbose:
>
> and:
>
> if not dupeRow:
Done. It will take some getting used to, though. I like that it's
shorter, but I could do the same in VBA and almost always chose not to.
> 2. You can print a blank line with an empty print statement:
>
> print
Done. I actually like the way print looks better than print ""
> 3. When looking for unique items, a set is a better choice than a list:
>
> addrCheck = set()
>
> def addrUnique(addr):
> if addr not in addrCheck:
> x = True
> addrCheck.add(addr)
> else:
> x = False
> return x
Done.
I researched this just now on StackOverflow:
"Sets are significantly faster when it comes to determining if an object
is present in the set"
and
"lists are very nice to sort and have order while sets are nice to use
when you don't want duplicates and don't care about order."
The speed difference won't matter here in my little app, but it's better
to use the right construct for the job.
> 4. Try string formatting instead multiple concatenation:
>
> print "%s arguments" % argCnt
You're referring to this line:
print str(argCnt) + " arguments"
Is there a real benefit of using string formatting here? (other than
the required str() conversion)
> 5. Strings have a .join method, and when you combine it with string
> slicing:
>
> keyw = "+".join(sys.argv[1 : argCnt - 5])
Slick. Works a treat, and saved 2 lines of code. String handling is
another area in which python shines compared to VB.
> 6. Another example of string formatting:
>
> search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)
Done. It's shorter, and doesn't require the str() conversion I had to
do on several of the items.
If I can remember to use it, it should eliminate these:
"TypeError: cannot concatenate 'str' and 'int' objects"
> 7. It's recommended to use the 'with' statement when handling files:
>
> with open(webfile, "w") as f:
> if store == "csv":
> f.write("Name,Address,CityStateZip\n")
Done. I read that using 'with' means Python closes the file even if an
exception occurs. So a definite benefit.
> If you don't want to use the 'with' statement, note that closing the
> file is:
>
> f.close()
>
> It needs the "()"!
I used close() in 1 place, but close without parens in 2 other places.
So it works either way. Good catch.
(it's moot now: all 'f.open()/f.close()' replaced by 'with open()')
> 8. When using SQL, you shouldn't try to insert the values yourself; you
> should use parametrised queries:
>
> cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
> if verbose:
> print cSQL
> db.execute(cSQL, (datasrc, search))
> conn.commit()
>
> It'll insert the values where the "?" are and will do any necessary
> quoting itself. (Actually, some drivers use "?", others use "%s", so if
> it doesn't work with one, try the other.)
>
> The way you wrote it, it would fail if a value contained a "'". It's
> that kind of thing that leads to SQL injection attacks.
Fixed.
You'll notice later on in the code I used the parameterized method for
INSERTS. I hate the look of that method, but it does make dealing with
apostrophes easier, and makes it safer as you say.
Thanks for the code review, RMAB. Good improvements.
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2016-05-06 16:29 -0700 |
| Message-ID | <mailman.439.1462577357.32212.python-list@python.org> |
| In reply to | #108243 |
On 05/06/2016 04:12 PM, DFS wrote: > On 5/6/2016 4:30 PM, MRAB wrote: >> If you don't want to use the 'with' statement, note that closing the >> file is: >> >> f.close() >> >> It needs the "()"! > > I used close() in 1 place, but close without parens in 2 other places. > So it works either way. Good catch. No, it doesn't. `f.close` simple returns the close function, it doesn't call it. The "it works" was simply because Python closed the files for you later. Not a big deal in a small program like this, but still a mistake. -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-06 19:58 -0400 |
| Message-ID | <ngjatb$7g9$1@dont-email.me> |
| In reply to | #108244 |
On 5/6/2016 7:29 PM, Ethan Furman wrote: > On 05/06/2016 04:12 PM, DFS wrote: >> On 5/6/2016 4:30 PM, MRAB wrote: > >>> If you don't want to use the 'with' statement, note that closing the >>> file is: >>> >>> f.close() >>> >>> It needs the "()"! >> >> I used close() in 1 place, but close without parens in 2 other places. >> So it works either way. Good catch. > > No, it doesn't. `f.close` simple returns the close function, it doesn't > call it. The "it works" was simply because Python closed the files for > you later. > > Not a big deal in a small program like this, but still a mistake. Yes. Check out the answer by 'unutbu' here: http://stackoverflow.com/questions/1832528/is-close-necessary-when-using-iterator-on-a-python-file-object He says "I...checked /proc/PID/fd for when the file descriptor was closed. It appears that when you break out of the for loop, the file is closed for you." Improper f.close didn't seem to affect any of the files my program wrote - and I checked a lot of them when I was writing the code. Maybe it worked because the last time the file was written to was in a for loop, so I got lucky and the files weren't truncated? Don't know. Did you notice any other gotchas in the program?
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-05-07 01:38 +0100 |
| Message-ID | <mailman.442.1462581532.32212.python-list@python.org> |
| In reply to | #108247 |
On 2016-05-07 00:58, DFS wrote: > On 5/6/2016 7:29 PM, Ethan Furman wrote: >> On 05/06/2016 04:12 PM, DFS wrote: >>> On 5/6/2016 4:30 PM, MRAB wrote: >> >>>> If you don't want to use the 'with' statement, note that closing the >>>> file is: >>>> >>>> f.close() >>>> >>>> It needs the "()"! >>> >>> I used close() in 1 place, but close without parens in 2 other places. >>> So it works either way. Good catch. >> >> No, it doesn't. `f.close` simple returns the close function, it doesn't >> call it. The "it works" was simply because Python closed the files for >> you later. >> >> Not a big deal in a small program like this, but still a mistake. > > > Yes. > > Check out the answer by 'unutbu' here: > > http://stackoverflow.com/questions/1832528/is-close-necessary-when-using-iterator-on-a-python-file-object > > He says "I...checked /proc/PID/fd for when the file descriptor was > closed. It appears that when you break out of the for loop, the file is > closed for you." > If you read the comments for that answer, you'll find the explanation. > Improper f.close didn't seem to affect any of the files my program wrote > - and I checked a lot of them when I was writing the code. > > Maybe it worked because the last time the file was written to was in a > for loop, so I got lucky and the files weren't truncated? Don't know. > > Did you notice any other gotchas in the program? >
[toc] | [prev] | [next] | [standalone]
| From | Stephen Hansen <me+python@ixokai.io> |
|---|---|
| Date | 2016-05-06 23:03 -0700 |
| Message-ID | <mailman.445.1462601024.32212.python-list@python.org> |
| In reply to | #108247 |
On Fri, May 6, 2016, at 04:58 PM, DFS wrote:
> Improper f.close didn't seem to affect any of the files my program wrote
> - and I checked a lot of them when I was writing the code.
To be clear, its not an "improper" f.close. That command is simply not
closing the file. Period. "f.close" is how you get the 'close' function
from the 'f' object, and then... you do nothing with it.
If you removed "f.close" entirely, you'd get the exact same behavior as
you have now. The "f.close" does nothing.
That said, in CPython semantics, closing a file explicitly is often not
required. CPython is reference-counted. Once the references to an object
reaches 0, CPython deletes the object. This is an implementation detail
of the CPython and not a guarantee of the Python language itself, which
is why explicit close calls are preferred.
So while 'f.close' does nothing, CPython might be closing the file
*anyways*, and it might work... but that 'might' is hard to reason about
without a deeper understanding, so using explicit closing mechanics
(either via f.close() or with or something else) is strongly
recommended.
For example, if you were to do:
for item in sequence:
f = open(item, 'wb')
f.write("blah")
It probably works fine. The first time through, 'f' is bound to a file
object, and you write to it. The second time through, 'f' is bound to a
*new file object*, and the original file object now has 0 references, so
is automatically deleted.
The last sequence through, f is not closed: the 'for loop' is not a
scope which deletes its internal name bindings when its done. So that
'f' will likely remain open until the very end of the current function,
which may be an issue for you.
Implicit closing actually works in a large number of situations in
CPython, but it isn't a good thing to rely on. It only works in simple
operations where you aren't accidentally storing a reference somewhere
else. You have to keep track of the references in your head to make sure
things will get closed at proper times.
The 'with' statement clearly defines when resources should be closed, so
its preferred (As I see you've adopted from other responses). But its
also needed in other Python implementations which might not follow
CPython's reference counting scheme.
I'm not giving further feedback because MRAB caught everything I thought
was an issue.
--
Stephen Hansen
m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2016-05-07 18:24 +1200 |
| Message-ID | <dp5g1fF7ndoU1@mid.individual.net> |
| In reply to | #108247 |
DFS wrote: > Maybe it worked because the last time the file was written to was in a > for loop, so I got lucky and the files weren't truncated? Don't know. It "works" because CPython disposes of objects as soon as they are not referenced anywhere. Other implementations of Python (e.g. Jython, PyPy) might not do that. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | alister <alister.ware@ntlworld.com> |
|---|---|
| Date | 2016-05-07 08:51 +0000 |
| Message-ID | <G0iXy.190130$rZ7.62030@fx33.am4> |
| In reply to | #108253 |
On Sat, 07 May 2016 18:24:45 +1200, Gregory Ewing wrote:
> DFS wrote:
>> Maybe it worked because the last time the file was written to was in a
>> for loop, so I got lucky and the files weren't truncated? Don't know.
>
> It "works" because CPython disposes of objects as soon as they are not
> referenced anywhere. Other implementations of Python (e.g. Jython, PyPy)
> might not do that.
to provide an example try the following code in the interactive
interpreter
>>>f=open('somefile','w')
>>print f.write('line 1')
None
>>>print f.close
built-in method close of file object at 0x7fb4c9580660>
>>>print f.write('line 2')
None
>>>print f.close()
None
>>>print f.write('line 3')
ValueError: I/O operation on closed file
somefile will contain
line 1
line 2
--
manager in the cable duct
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-05-07 15:59 +0200 |
| Message-ID | <mailman.454.1462629557.32212.python-list@python.org> |
| In reply to | #108236 |
DFS wrote:
> getAddresses.py
>
> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
> database, or writes them to text files for mailing labels, etc
>
> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
> out how many Taco Bells are within 10 miles of you, and store all the
> addresses in your own address database.
>
> No more convoluted Googling, or hitting the 'Next Page' button, or
> fumbling with the Yellow Pages...
>
> Note: the db structure is flat on purpose, and the .csv files aren't
> quote delimited.
>
> Put the program in its own directory. It creates the SQLite database
> there, and writes files there, too.
>
> Reviews of code, bug reports, criticisms, suggestions for improvement,
> etc are all welcome.
- Avoid module-level code and global variables
- Use functions that do one thing and operate on explicitly passed arguments
- You have
if store == ...:
...
sprinkled across your module. You will have to change your code in many
places if you want to add another output format. With a linear structure
like
STORE_FUNCS = {
"db": store_in_db,
"txt": store_as_text,
"csv": store_as_csv,
}
def main():
args = read_arguments()
records = read_records(args)
records = unique(records)
if args.limit:
records = itertools.islice(records, args.limit)
STORE_FUNCS[args.storage_format](args, records)
if __name__ == "__main__":
main()
further enhancements will be a lot easier to implement.
The main() function avoids accidental uncontrolled globals. If you want one
you have to declare it:
def main():
global verbose
args = read_arguments()
verbose = args.verbose
...
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web