Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #108236 > unrolled thread

A fun python CLI program for all to enjoy!

Started byDFS <nospam@dfs.com>
First post2016-05-06 15:10 -0400
Last post2016-05-07 15:59 +0200
Articles 10 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 15:10 -0400
    Re: A fun python CLI program for all to enjoy! MRAB <python@mrabarnett.plus.com> - 2016-05-06 21:30 +0100
      Re: A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 19:12 -0400
        Re: A fun python CLI program for all to enjoy! Ethan Furman <ethan@stoneleaf.us> - 2016-05-06 16:29 -0700
          Re: A fun python CLI program for all to enjoy! DFS <nospam@dfs.com> - 2016-05-06 19:58 -0400
            Re: A fun python CLI program for all to enjoy! MRAB <python@mrabarnett.plus.com> - 2016-05-07 01:38 +0100
            Re: A fun python CLI program for all to enjoy! Stephen Hansen <me+python@ixokai.io> - 2016-05-06 23:03 -0700
            Re: A fun python CLI program for all to enjoy! Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-07 18:24 +1200
              Re: A fun python CLI program for all to enjoy! alister <alister.ware@ntlworld.com> - 2016-05-07 08:51 +0000
    Re: A fun python CLI program for all to enjoy! Peter Otten <__peter__@web.de> - 2016-05-07 15:59 +0200

#108236 — A fun python CLI program for all to enjoy!

FromDFS <nospam@dfs.com>
Date2016-05-06 15:10 -0400
SubjectA fun python CLI program for all to enjoy!
Message-ID<ngiq0u$brn$1@dont-email.me>
getAddresses.py

Scrapes addresses from www.usdirectory.com and stores them in a SQLite 
database, or writes them to text files for mailing labels, etc

Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find 
out how many Taco Bells are within 10 miles of you, and store all the 
addresses in your own address database.

No more convoluted Googling, or hitting the 'Next Page' button, or 
fumbling with the Yellow Pages...

Note: the db structure is flat on purpose, and the .csv files aren't 
quote delimited.

Put the program in its own directory.  It creates the SQLite database 
there, and writes files there, too.

Reviews of code, bug reports, criticisms, suggestions for improvement, 
etc are all welcome.

Enjoy!



========================================================================
#getAddresses.py

import os, sys, requests, time, datetime
from lxml import html
import pyodbc, sqlite3, re


#show values of variables, HTML content, etc
#set it to False for short/concise program output
verbose = False
if verbose == True:
	print "The verbose setting is turned On."
	print ""

	
#check if address is unique
addrCheck = []
def addrUnique(addr):
	if addr not in addrCheck:
		x = True
		addrCheck.append(addr)
	else: x = False	
	return x

	
#validate and parse command line
def showHelp():
	print ""
	print " Enter search word(s), city or zip, state, miles to search, txt 
or csv or db, # addresses to save (no commas)"
	print ""
	print " eg: restaurant Knoxville TN 10 txt 50"
	print "     search for restaurants within 10 miles of Knoxville TN, and 
write"
	print "     the first 50 address to a txt file"
	print ""
	print " eg: furniture 30303 GA 20 csv all"
	print "     search for furniture within 20 miles of zip 30303 GA,"
	print "     and write all results to a csv file"
	print ""
	print " eg: boxing gyms Detroit MI 10 db 5"
	print "     search for boxing gyms within 10 miles of Detroit MI, and 
store"
	print "     the first 5 results in a database"
	print ""
	print " All entries are case-insensitive (ie TX or tx are acceptable)"
	exit(0)

argCnt = len(sys.argv)
if argCnt < 7: showHelp()
if verbose == True:
	print ""
	print str(argCnt) + " arguments"

keyw = ""							#eg restaurant, boxing gym
if argCnt == 7:	keyw = sys.argv[1]	#one search word
if argCnt >  7: 					#multiple search words
	for i in range(1,argCnt-5):
		keyw = keyw + sys.argv[i] + "+"
	keyw = keyw[:-1]			#drop trailing + sign
cityzip  = sys.argv[argCnt-5]   #eg Atlanta or 30339
state    = sys.argv[argCnt-4]   #eg GA
miles    = sys.argv[argCnt-3]   #eg 5,10,20,30,50 (website allows max 30)
store    = sys.argv[argCnt-2]   #write address to file or database
addrWant = sys.argv[argCnt-1]   #eg save All or number >0

if addrWant.lower() != "all":	#how many addresses to save
	if addrWant.isdigit() == False: showHelp()
	if addrWant == "0"            : showHelp()
	addrWant = int(addrWant)
elif addrWant.lower() == "all": addrWant = addrWant.lower()
else: addrWant = int(addrWant)

if store != "csv" and store != "txt" and store != "db": showHelp()


#begin timing the code
startTime = time.clock()	


#website, SQLite db, search string, current date/time for use with db
datasrc = "www.usdirectory.com"
dbName  = "addresses.sqlite"
search  = keyw + " " + str(cityzip) + " " + state + " " + str(miles) + " 
" + str(addrWant)
loaddt = datetime.datetime.now()


#write addresses to file
#each time the same search is done, the file is deleted and recreated
if store == "csv" or store == "txt":
	#csv will write in .csv format - header and 1 line per address
	#txt will write out 3 lines per address, then blank before next address
	webfile  = "usdirectory.com_"+keyw+"_"+cityzip+"_"+state+"."+store
	f = open(webfile,"w")
	if store == "csv": f.write("Name,Address,CityStateZip\n")
	f.close


#store addresses in database	
cSQL = ""
if store == "db":	
	#creates a SQLite database that Access 2003 can't read
	#conn = sqlite3.connect(dbName)

	#also creates a SQLite database that Access 2003 can't read
	conn = pyodbc.connect('Driver={SQLite3 ODBC Driver};Database=' + dbName)
	db   = conn.cursor()

	cSQL =  "CREATE TABLE If Not Exists ADDRESSES "
	cSQL += "(datasrc, search, category, name, street, city, state, zip, 
loaddt, "
	cSQL += "PRIMARY KEY (datasrc, search, name, street));"
	db.execute(cSQL)
	
	# cSQL =  "CREATE TABLE If Not Exists CATEGORIES "
	# cSQL += "(catID INTEGER PRIMARY KEY, catDesc);"
	# db.execute(cSQL)
	# db.execute("CREATE UNIQUE INDEX If Not Exists UIDX_CATDESC ON 
CATEGORIES (catDesc);")
	
	conn.commit()
	if verbose == True:
		print("connected to database: " + dbName)
		print cSQL
		print("created table: addresses")
		print("")
	
	
if verbose == True:
	print "Search summary"
	print "------------------------------"
	print "Keywords: " + keyw
	print "City/Zip: " + cityzip
	print "State   : " + state
	print "Radius  : " + str(miles)    + " miles"
	print "Save    : " + str(addrWant) + " addresses to " + store
	print "------------------------------"
	print ""
	
#build url
wBase    = "http://www.usdirectory.com"
wForm    = "/ypr.aspx?fromform=qsearch"
wKeyw   = "&qhqn=" + keyw
wCityZip = "&qc="   + cityzip
wState   = "&qs="   + state
wDist    = "&rg="   + str(miles)
wSort    = "&sb=a2z"  #sort alpha
wPage    = "&ap="   #used with the results page number
webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist
if verbose == True:
	print "Search url: \n" + webpage
	print ""


#delete previous results of identical search
if store == "db":
	cSQL = "DELETE FROM addresses "
	cSQL += "WHERE datasrc = '" + datasrc + "' "
	cSQL += "AND   search  = '" + search + "';"
	if verbose == True: print cSQL
	db.execute(cSQL)
	conn.commit()

#query web server, save results
print "searching..."
i = 0
dupes = 0
addrReturned = 0
addrSaved = 0
while 1:
	wPageNbr = wPage + str(i+1)
	webpage  = wBase + wForm + wKeyw + wCityZip + wState + wDist + wSort + 
wPageNbr
	page = requests.get(webpage)
	tree = html.fromstring(page.content)
	
	#no matches
	matches = tree.xpath('//strong/text()') 	
	if i == 0 and "No results were found" in str(matches):
		print "No results found for that search"
		exit(0)
		os.remove(webfile)
		
	#parse number of addresses returned
	#some searches return 2 items: ['Found N results', 'junk']
	#some searches return 3 items: ['Filter this search','Found N 
results','junk']
	if i == 0:
		match  = tree.xpath('//div[@class="header_text"]/text()')
		if len(match) > 2: match.pop(0) #remove first element if 2
		match = [int(s) for s in match[0].split() if s.isdigit()]
		addrFound = match[0]
		if addrWant != "all": addrWant = min(addrWant,addrFound)
		print str(addrFound) + " matches found (" + str(addrWant) + " will be 
saved)"
		print ""
	
	
	#split names, addresses into lists
	nms   = tree.xpath('//span[@class="header_text3"]/text()')
	if len(nms) == 0: break
	addr   = tree.xpath('//span[@class="text3"]/text()')
	addr   = [t.replace("\r\n", "")  for t in addr]
	addr   = filter(None, (t.strip() for t in addr))
	street = [s.split(',')[0] for s in addr]
	city   = [c.split(',')[1].strip() for c in addr]
	state  = [s[-8:][:2] for s in addr]
	zip    = [z[-5:] for z in addr]

	#get usdirectory.com categories
	category = tree.xpath('//a/text()')
	category = [c.strip() for c in category]
	category = filter(None, category)
	pattern  = re.compile(r"^[A-Z\s&,-]+$")
	category = [x for x in category if pattern.match(x)]
	
	#screen feedback
	print "retrieving page " + str(i+1)	+ ": " + str(len(nms)) + " addresses"
	if verbose == True:
		print ""
		print "Names: \n" + str(nms)
		print ""
		print "Addresses: \n" + str(addr)
		print ""
		print "Categories: \n" + str(category)
		print 
"----------------------------------------------------------------------------------------"
		print ""
		
	#data integrity check - make sure all lists have same # of items
	lenData = 
[len(category),len(nms),len(addr),len(street),len(city),len(state),len(zip)]
	if len(set(lenData)) != 1:
		print "Data parsing issue.  One or more lists has an incorrect number 
of items.  Program will exit."
		exit(0)
	if verbose == True:
		if len(set(lenData)) == 1: print "Verified: each list has " + 
str(len(nms)) + " items in it."
	
	
	#write addresses to file
	if store == "txt" or store == "csv":
		addrList = []
		for j in range(len(nms)):
			if addrUnique(nms[j]+' '+street[j]) == True:
				if store == "txt": 
addrList.append(nms[j]+'\n'+street[j]+'\n'+city[j]+', '+state[j]+' 
'+zip[j]+'\n\n')
				if store == "csv": addrList.append(nms[j]+',' +street[j]+',' 
+city[j]+' ' +state[j]+'  '+zip[j]+'\n')
				addrSaved += 1
			else:
				dupes += 1
				print " * duplicate address found: " + nms[j] + ", " + street[j]
			addrReturned += 1			
			if addrWant != "all":
				if addrSaved >= addrWant: break
			
		f = open(webfile,"a")
		for address in addrList: f.write(address)
		f.close

		
	#write addresses to database
	if store == "db":
		for j in range(len(nms)):
			dupeRow = False
			cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?,?,?,?,?)" 
#(datasrc,search,category,name,street,city,state,zip,loaddt) "
			Vals = 
datasrc,search,category[j],nms[j],street[j],city[j],state[j],zip[j],str(loaddt)
			if verbose == True: print cSQL + ',' + str(Vals)
			try: db.execute(cSQL, Vals)
			except (pyodbc.Error) as programError:
				if str(programError).find("UNIQUE constraint failed") > 0:
					dupeRow = True
					dupes +=1
					print " * duplicate address found: " + nms[j] + ", " + street[j]
					pass
			addrReturned += 1		
			if dupeRow == False:
				addrSaved += 1
			if addrWant != "all":
				if addrSaved >= addrWant: break
		conn.commit()
		
		
	if addrSaved >= addrFound or addrSaved >= addrWant: break
	i += 1
	time.sleep(2)

	
#finish
if (store == "csv" or store == "txt"):
	print "\nFinished\nWrote " + str(addrSaved) + " addresses to file " + 
webfile

elif store == "db":
	db.close()
	conn.close()
	print "\nFinished\nStored " + str(addrSaved) + " addresses in database: 
" + dbName
	
if dupes > 0: print "(" + str(dupes) + " duplicate addresses ignored)"
	
#timer
endTime = time.clock()
print "processing time: %.2g seconds" %(endTime-startTime)

#bug in www.usdirectory.com code: usually overreports matches by 1
if (addrWant == "all") and (addrReturned != addrFound):
	print "Note: " + datasrc + " reported " + str(addrFound) + " matches, 
but returned " + str(addrReturned)
========================================================================	

[toc] | [next] | [standalone]


#108237

FromMRAB <python@mrabarnett.plus.com>
Date2016-05-06 21:30 +0100
Message-ID<mailman.436.1462566630.32212.python-list@python.org>
In reply to#108236
On 2016-05-06 20:10, DFS wrote:
> getAddresses.py
>
> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
> database, or writes them to text files for mailing labels, etc
>
> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
> out how many Taco Bells are within 10 miles of you, and store all the
> addresses in your own address database.
>
> No more convoluted Googling, or hitting the 'Next Page' button, or
> fumbling with the Yellow Pages...
>
> Note: the db structure is flat on purpose, and the .csv files aren't
> quote delimited.
>
> Put the program in its own directory.  It creates the SQLite database
> there, and writes files there, too.
>
> Reviews of code, bug reports, criticisms, suggestions for improvement,
> etc are all welcome.
>
OK, you asked for it... :-)

1. It's shorter and clearer not to compare with True or False:

        if verbose:

    and:

        if not dupeRow:

2. You can print a blank line with an empty print statement:

        print

3. When looking for unique items, a set is a better choice than a list:

        addrCheck = set()

        def addrUnique(addr):
            if addr not in addrCheck:
                x = True
                addrCheck.add(addr)
            else:
                x = False
            return x

4. Try string formatting instead multiple concatenation:

        print "%s arguments" % argCnt

5. Strings have a .join method, and when you combine it with string slicing:

        keyw = "+".join(sys.argv[1 : argCnt - 5])

6. Another example of string formatting:

        search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)

7. It's recommended to use the 'with' statement when handling files:

        with open(webfile, "w") as f:
            if store == "csv":
                f.write("Name,Address,CityStateZip\n")

    If you don't want to use the 'with' statement, note that closing the 
file is:

            f.close()

    It needs the "()"!

8. When using SQL, you shouldn't try to insert the values yourself; you 
should use parametrised queries:

        cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
        if verbose:
            print cSQL
        db.execute(cSQL, (datasrc, search))
        conn.commit()

     It'll insert the values where the "?" are and will do any necessary 
quoting itself. (Actually, some drivers use "?", others use "%s", so if 
it doesn't work with one, try the other.)

     The way you wrote it, it would fail if a value contained a "'". 
It's that kind of thing that leads to SQL injection attacks.

[toc] | [prev] | [next] | [standalone]


#108243

FromDFS <nospam@dfs.com>
Date2016-05-06 19:12 -0400
Message-ID<ngj86q$ia$1@dont-email.me>
In reply to#108237
On 5/6/2016 4:30 PM, MRAB wrote:
> On 2016-05-06 20:10, DFS wrote:
>> getAddresses.py
>>
>> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
>> database, or writes them to text files for mailing labels, etc
>>
>> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
>> out how many Taco Bells are within 10 miles of you, and store all the
>> addresses in your own address database.
>>
>> No more convoluted Googling, or hitting the 'Next Page' button, or
>> fumbling with the Yellow Pages...
>>
>> Note: the db structure is flat on purpose, and the .csv files aren't
>> quote delimited.
>>
>> Put the program in its own directory.  It creates the SQLite database
>> there, and writes files there, too.
>>
>> Reviews of code, bug reports, criticisms, suggestions for improvement,
>> etc are all welcome.
>>
> OK, you asked for it... :-)
>
> 1. It's shorter and clearer not to compare with True or False:
>
>        if verbose:
>
>    and:
>
>        if not dupeRow:


Done.  It will take some getting used to, though.  I like that it's 
shorter, but I could do the same in VBA and almost always chose not to.



> 2. You can print a blank line with an empty print statement:
>
>        print

Done.  I actually like the way print  looks better than print ""



> 3. When looking for unique items, a set is a better choice than a list:
>
>        addrCheck = set()
>
>        def addrUnique(addr):
>            if addr not in addrCheck:
>                x = True
>                addrCheck.add(addr)
>            else:
>                x = False
>            return x

Done.

I researched this just now on StackOverflow:

"Sets are significantly faster when it comes to determining if an object 
is present in the set"
and
"lists are very nice to sort and have order while sets are nice to use 
when you don't want duplicates and don't care about order."

The speed difference won't matter here in my little app, but it's better 
to use the right construct for the job.



> 4. Try string formatting instead multiple concatenation:
>
>        print "%s arguments" % argCnt


You're referring to this line:
print str(argCnt) + " arguments"

Is there a real benefit of using string formatting here?  (other than 
the required str() conversion)



> 5. Strings have a .join method, and when you combine it with string
> slicing:
>
>        keyw = "+".join(sys.argv[1 : argCnt - 5])


Slick.  Works a treat, and saved 2 lines of code.  String handling is 
another area in which python shines compared to VB.


> 6. Another example of string formatting:
>
>        search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)

Done.  It's shorter, and doesn't require the str() conversion I had to 
do on several of the items.

If I can remember to use it, it should eliminate these:
"TypeError: cannot concatenate 'str' and 'int' objects"



> 7. It's recommended to use the 'with' statement when handling files:
>
>        with open(webfile, "w") as f:
>            if store == "csv":
>                f.write("Name,Address,CityStateZip\n")


Done.  I read that using 'with' means Python closes the file even if an 
exception occurs.  So a definite benefit.



>    If you don't want to use the 'with' statement, note that closing the
> file is:
>
>            f.close()
>
>    It needs the "()"!

I used close() in 1 place, but close without parens in 2 other places. 
So it works either way.  Good catch.

(it's moot now: all 'f.open()/f.close()' replaced by 'with open()')



> 8. When using SQL, you shouldn't try to insert the values yourself; you
> should use parametrised queries:
>
>        cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
>        if verbose:
>            print cSQL
>        db.execute(cSQL, (datasrc, search))
>        conn.commit()
>
>     It'll insert the values where the "?" are and will do any necessary
> quoting itself. (Actually, some drivers use "?", others use "%s", so if
> it doesn't work with one, try the other.)
>
>     The way you wrote it, it would fail if a value contained a "'". It's
> that kind of thing that leads to SQL injection attacks.

Fixed.

You'll notice later on in the code I used the parameterized method for 
INSERTS.  I hate the look of that method, but it does make dealing with 
apostrophes easier, and makes it safer as you say.




Thanks for the code review, RMAB.  Good improvements.

[toc] | [prev] | [next] | [standalone]


#108244

FromEthan Furman <ethan@stoneleaf.us>
Date2016-05-06 16:29 -0700
Message-ID<mailman.439.1462577357.32212.python-list@python.org>
In reply to#108243
On 05/06/2016 04:12 PM, DFS wrote:
> On 5/6/2016 4:30 PM, MRAB wrote:

>>    If you don't want to use the 'with' statement, note that closing the
>> file is:
>>
>>            f.close()
>>
>>    It needs the "()"!
>
> I used close() in 1 place, but close without parens in 2 other places.
> So it works either way.  Good catch.

No, it doesn't.  `f.close` simple returns the close function, it doesn't 
call it.  The "it works" was simply because Python closed the files for 
you later.

Not a big deal in a small program like this, but still a mistake.

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#108247

FromDFS <nospam@dfs.com>
Date2016-05-06 19:58 -0400
Message-ID<ngjatb$7g9$1@dont-email.me>
In reply to#108244
On 5/6/2016 7:29 PM, Ethan Furman wrote:
> On 05/06/2016 04:12 PM, DFS wrote:
>> On 5/6/2016 4:30 PM, MRAB wrote:
>
>>>    If you don't want to use the 'with' statement, note that closing the
>>> file is:
>>>
>>>            f.close()
>>>
>>>    It needs the "()"!
>>
>> I used close() in 1 place, but close without parens in 2 other places.
>> So it works either way.  Good catch.
>
> No, it doesn't.  `f.close` simple returns the close function, it doesn't
> call it.  The "it works" was simply because Python closed the files for
> you later.
>
> Not a big deal in a small program like this, but still a mistake.


Yes.

Check out the answer by 'unutbu' here:

http://stackoverflow.com/questions/1832528/is-close-necessary-when-using-iterator-on-a-python-file-object

He says "I...checked /proc/PID/fd for when the file descriptor was 
closed. It appears that when you break out of the for loop, the file is 
closed for you."

Improper f.close didn't seem to affect any of the files my program wrote 
- and I checked a lot of them when I was writing the code.

Maybe it worked because the last time the file was written to was in a 
for loop, so I got lucky and the files weren't truncated?  Don't know.

Did you notice any other gotchas in the program?

[toc] | [prev] | [next] | [standalone]


#108248

FromMRAB <python@mrabarnett.plus.com>
Date2016-05-07 01:38 +0100
Message-ID<mailman.442.1462581532.32212.python-list@python.org>
In reply to#108247
On 2016-05-07 00:58, DFS wrote:
> On 5/6/2016 7:29 PM, Ethan Furman wrote:
>> On 05/06/2016 04:12 PM, DFS wrote:
>>> On 5/6/2016 4:30 PM, MRAB wrote:
>>
>>>>    If you don't want to use the 'with' statement, note that closing the
>>>> file is:
>>>>
>>>>            f.close()
>>>>
>>>>    It needs the "()"!
>>>
>>> I used close() in 1 place, but close without parens in 2 other places.
>>> So it works either way.  Good catch.
>>
>> No, it doesn't.  `f.close` simple returns the close function, it doesn't
>> call it.  The "it works" was simply because Python closed the files for
>> you later.
>>
>> Not a big deal in a small program like this, but still a mistake.
>
>
> Yes.
>
> Check out the answer by 'unutbu' here:
>
> http://stackoverflow.com/questions/1832528/is-close-necessary-when-using-iterator-on-a-python-file-object
>
> He says "I...checked /proc/PID/fd for when the file descriptor was
> closed. It appears that when you break out of the for loop, the file is
> closed for you."
>
If you read the comments for that answer, you'll find the explanation.

> Improper f.close didn't seem to affect any of the files my program wrote
> - and I checked a lot of them when I was writing the code.
>
> Maybe it worked because the last time the file was written to was in a
> for loop, so I got lucky and the files weren't truncated?  Don't know.
>
> Did you notice any other gotchas in the program?
>

[toc] | [prev] | [next] | [standalone]


#108252

FromStephen Hansen <me+python@ixokai.io>
Date2016-05-06 23:03 -0700
Message-ID<mailman.445.1462601024.32212.python-list@python.org>
In reply to#108247
On Fri, May 6, 2016, at 04:58 PM, DFS wrote:
> Improper f.close didn't seem to affect any of the files my program wrote 
> - and I checked a lot of them when I was writing the code.

To be clear, its not an "improper" f.close. That command is simply not
closing the file. Period. "f.close" is how you get the 'close' function
from the 'f' object, and then... you do nothing with it.

If you removed "f.close" entirely, you'd get the exact same behavior as
you have now. The "f.close" does nothing.

That said, in CPython semantics, closing a file explicitly is often not
required. CPython is reference-counted. Once the references to an object
reaches 0, CPython deletes the object. This is an implementation detail
of the CPython and not a guarantee of the Python language itself, which
is why explicit close calls are preferred.

So while 'f.close' does nothing, CPython might be closing the file
*anyways*, and it might work... but that 'might' is hard to reason about
without a deeper understanding, so using explicit closing mechanics
(either via f.close() or with or something else) is strongly
recommended. 

For example, if you were to do:

for item in sequence:
    f = open(item, 'wb')
    f.write("blah")

It probably works fine. The first time through, 'f' is bound to a file
object, and you write to it. The second time through, 'f' is bound to a
*new file object*, and the original file object now has 0 references, so
is automatically deleted. 

The last sequence through, f is not closed: the 'for loop' is not a
scope which deletes its internal name bindings when its done. So that
'f' will likely remain open until the very end of the current function,
which may be an issue for you.

Implicit closing actually works in a large number of situations in
CPython, but it isn't a good thing to rely on. It only works in simple
operations where you aren't accidentally storing a reference somewhere
else. You have to keep track of the references in your head to make sure
things will get closed at proper times.

The 'with' statement clearly defines when resources should be closed, so
its preferred (As I see you've adopted from other responses). But its
also needed in other Python implementations which might not follow
CPython's reference counting scheme.

I'm not giving further feedback because MRAB caught everything I thought
was an issue.

-- 
Stephen Hansen
  m e @ i x o k a i . i o

[toc] | [prev] | [next] | [standalone]


#108253

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2016-05-07 18:24 +1200
Message-ID<dp5g1fF7ndoU1@mid.individual.net>
In reply to#108247
DFS wrote:
> Maybe it worked because the last time the file was written to was in a 
> for loop, so I got lucky and the files weren't truncated?  Don't know.

It "works" because CPython disposes of objects as soon
as they are not referenced anywhere. Other implementations
of Python (e.g. Jython, PyPy) might not do that.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#108264

Fromalister <alister.ware@ntlworld.com>
Date2016-05-07 08:51 +0000
Message-ID<G0iXy.190130$rZ7.62030@fx33.am4>
In reply to#108253
On Sat, 07 May 2016 18:24:45 +1200, Gregory Ewing wrote:

> DFS wrote:
>> Maybe it worked because the last time the file was written to was in a
>> for loop, so I got lucky and the files weren't truncated?  Don't know.
> 
> It "works" because CPython disposes of objects as soon as they are not
> referenced anywhere. Other implementations of Python (e.g. Jython, PyPy)
> might not do that.

to provide an example try the following code in the interactive 
interpreter

>>>f=open('somefile','w')
>>print f.write('line 1')
None
>>>print f.close
built-in method close of file object at 0x7fb4c9580660>
>>>print f.write('line 2')
None
>>>print f.close()
None
>>>print f.write('line 3')
ValueError: I/O operation on closed file

somefile will contain

line 1
line 2





-- 
manager in the cable duct

[toc] | [prev] | [next] | [standalone]


#108268

FromPeter Otten <__peter__@web.de>
Date2016-05-07 15:59 +0200
Message-ID<mailman.454.1462629557.32212.python-list@python.org>
In reply to#108236
DFS wrote:

> getAddresses.py
> 
> Scrapes addresses from www.usdirectory.com and stores them in a SQLite
> database, or writes them to text files for mailing labels, etc
> 
> Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
> out how many Taco Bells are within 10 miles of you, and store all the
> addresses in your own address database.
> 
> No more convoluted Googling, or hitting the 'Next Page' button, or
> fumbling with the Yellow Pages...
> 
> Note: the db structure is flat on purpose, and the .csv files aren't
> quote delimited.
> 
> Put the program in its own directory.  It creates the SQLite database
> there, and writes files there, too.
> 
> Reviews of code, bug reports, criticisms, suggestions for improvement,
> etc are all welcome.

- Avoid module-level code and global variables
- Use functions that do one thing and operate on explicitly passed arguments
- You have 

if store == ...:
   ...

sprinkled across your module. You will have to change your code in many 
places if you want to add another output format. With a linear structure 
like

STORE_FUNCS = {
    "db": store_in_db,
    "txt": store_as_text,
    "csv": store_as_csv,
}

def main():
    args = read_arguments()
    records = read_records(args)
    records = unique(records)
    if args.limit:
        records = itertools.islice(records, args.limit)
    STORE_FUNCS[args.storage_format](args, records)

if __name__ == "__main__":
    main()

further enhancements will be a lot easier to implement.

The main() function avoids accidental uncontrolled globals. If you want one 
you have to declare it:

def main():
    global verbose
    args = read_arguments()
    verbose = args.verbose
    ...

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web