Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #108275 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2016-05-07 12:51 -0400 |
| Last post | 2016-05-08 18:24 -0400 |
| Articles | 20 on this page of 69 — 19 participants |
Back to article view | Back to comp.lang.python
pylint woes DFS <nospam@dfs.com> - 2016-05-07 12:51 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 03:01 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-07 21:16 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 11:36 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-07 22:15 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 12:50 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-10 18:36 -0400
Re: pylint woes MRAB <python@mrabarnett.plus.com> - 2016-05-11 02:02 +0100
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-07 19:14 -0700
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-07 23:04 -0400
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-07 20:46 -0700
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 10:26 -0400
Re: pylint woes Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-08 08:50 +0300
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 10:25 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-09 00:36 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 11:06 -0400
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-08 08:15 -0700
Re: pylint woes Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-05-09 13:17 +1200
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-09 12:18 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 22:58 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-09 01:15 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:06 -0400
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-08 08:11 -0700
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-09 01:51 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:04 -0400
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-09 13:09 +1000
Re: pylint woes MRAB <python@mrabarnett.plus.com> - 2016-05-08 03:21 +0100
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-08 21:36 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:24 -0400
Re: pylint woes Joel Goldstick <joel.goldstick@gmail.com> - 2016-05-08 17:39 -0400
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-09 13:46 +1000
Re: pylint woes Michael Selik <michael.selik@gmail.com> - 2016-05-07 18:42 +0000
Re: pylint woes Peter Pearson <pkpearson@nowhere.invalid> - 2016-05-07 18:43 +0000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:05 -0400
Re: pylint woes Christopher Reimer <christopher_reimer@icloud.com> - 2016-05-07 11:52 -0700
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-07 23:38 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 13:56 +1000
Re: pylint woes Peter Otten <__peter__@web.de> - 2016-05-08 16:19 +0200
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-07 12:21 -0700
Re: pylint woes Stephen Hansen <me@ixokai.io> - 2016-05-07 12:23 -0700
Re: pylint woes Terry Reedy <tjreedy@udel.edu> - 2016-05-07 15:40 -0400
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-07 23:28 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 13:51 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 00:40 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 14:55 +1000
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-07 20:55 -0700
Re: pylint woes Ian Kelly <ian.g.kelly@gmail.com> - 2016-05-07 23:09 -0600
Re: pylint woes Peter Otten <__peter__@web.de> - 2016-05-08 16:12 +0200
Re: pylint woes Christopher Reimer <christopher_reimer@icloud.com> - 2016-05-07 12:43 -0700
Re: pylint woes Ray Cote <rgacote@appropriatesolutions.com> - 2016-05-07 15:52 -0400
Re: pylint woes Christopher Reimer <christopher_reimer@icloud.com> - 2016-05-07 13:20 -0700
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 07:56 +1000
Re: pylint woes Terry Reedy <tjreedy@udel.edu> - 2016-05-07 21:44 -0400
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-08 13:25 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 00:10 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 14:21 +1000
Re: pylint woes "D'Arcy J.M. Cain" <darcy@VybeNetworks.com> - 2016-05-08 08:50 -0400
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-08 23:01 +1000
Re: pylint woes Larry Hudson <orgnut@yahoo.com> - 2016-05-08 13:45 -0700
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-09 08:07 +1000
Re: pylint woes Larry Hudson <orgnut@yahoo.com> - 2016-05-08 18:28 -0700
Re: pylint woes Dan Sommers <dan@tombstonezero.net> - 2016-05-08 20:49 +0000
Re: pylint woes Chris Angelico <rosuav@gmail.com> - 2016-05-09 08:10 +1000
Re: pylint woes Steven D'Aprano <steve@pearwood.info> - 2016-05-09 03:25 +1000
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:16 -0400
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-08 14:38 -0700
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 17:46 -0400
Re: pylint woes Stephen Hansen <me+python@ixokai.io> - 2016-05-08 15:05 -0700
Re: pylint woes DFS <nospam@dfs.com> - 2016-05-08 18:24 -0400
Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-09 01:15 +1000 |
| Message-ID | <mailman.518.1462720540.32212.python-list@python.org> |
| In reply to | #108358 |
On Mon, May 9, 2016 at 1:06 AM, DFS <nospam@dfs.com> wrote:
> On 5/8/2016 10:36 AM, Chris Angelico wrote:
>>
>> On Mon, May 9, 2016 at 12:25 AM, DFS <nospam@dfs.com> wrote:
>>>
>>> for category,name,street,city,state,zipcode in ziplists:
>>> try: db.execute(cSQL, vals)
>>> except (pyodbc.Error) as programError:
>>> if str(programError).find("UNIQUE constraint failed") > 0:
>>> dupeRow = True
>>> dupes +=1
>>> print " * duplicate address found: "+name+", "+street
>>> else:
>>> pyodbcErr = True
>>> print "ODBC error: %s " % programError
>>> conn.commit()
>>> --------------------------------------------------------------------
>>>
>>
>> ... and then you just commit???!?
>>
>> ChrisA
>
>
>
> That's what commit() does.
Yes. Even if you got an error part way through, you just blithely commit. What?!
And yes, I am flat-out boggling at this.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-08 17:06 -0400 |
| Message-ID | <ngo9ie$ggs$3@dont-email.me> |
| In reply to | #108362 |
On 5/8/2016 11:15 AM, Chris Angelico wrote:
> On Mon, May 9, 2016 at 1:06 AM, DFS <nospam@dfs.com> wrote:
>> On 5/8/2016 10:36 AM, Chris Angelico wrote:
>>>
>>> On Mon, May 9, 2016 at 12:25 AM, DFS <nospam@dfs.com> wrote:
>>>>
>>>> for category,name,street,city,state,zipcode in ziplists:
>>>> try: db.execute(cSQL, vals)
>>>> except (pyodbc.Error) as programError:
>>>> if str(programError).find("UNIQUE constraint failed") > 0:
>>>> dupeRow = True
>>>> dupes +=1
>>>> print " * duplicate address found: "+name+", "+street
>>>> else:
>>>> pyodbcErr = True
>>>> print "ODBC error: %s " % programError
>>>> conn.commit()
>>>> --------------------------------------------------------------------
>>>>
>>>
>>> ... and then you just commit???!?
>>>
>>> ChrisA
>>
>>
>>
>> That's what commit() does.
>
> Yes. Even if you got an error part way through, you just blithely commit. What?!
>
> And yes, I am flat-out boggling at this.
>
> ChrisA
I'm boggling that you're boggling.
[toc] | [prev] | [next] | [standalone]
| From | Stephen Hansen <me+python@ixokai.io> |
|---|---|
| Date | 2016-05-08 08:11 -0700 |
| Message-ID | <mailman.516.1462720304.32212.python-list@python.org> |
| In reply to | #108355 |
On Sun, May 8, 2016, at 07:25 AM, DFS wrote:
> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):
> > for vals in zip(nms,street,city,state,zipcd):
> > nm,street,city,state,zipcd = vals
> > cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>
>
> I like the first one better. python is awesome, but too many options
> for doing the same thing also makes it difficult. For me, anyway.
Eeh, Now you're just making trouble for yourself.
for name, street, city, state, zipcd in zip(names, streets, cities,
states, zipcds):
....
may be sorta vaguely long, but its not that long. Just do it and move
on. Get over whatever makes you not like it.
--
Stephen Hansen
m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-09 01:51 +1000 |
| Message-ID | <572f607f$0$1588$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108355 |
On Mon, 9 May 2016 12:25 am, DFS wrote:
>>> for j in range(len(nms)):
>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
Why are you assigning cSQL to the same string over and over again?
Sure, assignments are cheap, but they're not infinitely cheap. They still
have a cost. Instead of paying that cost once, you pay it over and over
again, which adds up.
Worse, it is misleading. I had to read that code snippet three or four times
before I realised that cSQL was exactly the same each time.
> I tried:
>
> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd):
>
> but felt it was too long and wordy.
It's long and wordy because you're doing something long and wordy. It is
*inherently* long and wordy to process five things, whether you write it
as:
for i in range(len(names)):
name = names[i]
street = streets[i]
city = cities[i]
state = states[i]
zipcode = zipcodes[i]
process(...)
or as:
for name, street, city, state, zipcode in zip(
names, streets, cities, states, zipcodes
):
process(...)
> I like the first one better. python is awesome, but too many options
> for doing the same thing also makes it difficult. For me, anyway.
That's the difference between a master and an apprentice. The apprentice
likes to follow fixed steps the same way each time. The master craftsman
knows her tools backwards, and can choose the right tool for the job, and
when the choice of tool really doesn't matter and you can use whatever
happens to be the closest to hand.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-08 17:04 -0400 |
| Message-ID | <ngo9dn$ggs$1@dont-email.me> |
| In reply to | #108365 |
On 5/8/2016 11:51 AM, Steven D'Aprano wrote: > On Mon, 9 May 2016 12:25 am, DFS wrote: > >>>> for j in range(len(nms)): >>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)" >>>> vals = nms[j],street[j],city[j],state[j],zipcd[j] > > Why are you assigning cSQL to the same string over and over again? I like it in cloxe proximity to the vals statement. > Sure, assignments are cheap, but they're not infinitely cheap. They still > have a cost. Instead of paying that cost once, you pay it over and over > again, which adds up. Adds up to what? > Worse, it is misleading. I had to read that code snippet three or four times > before I realised that cSQL was exactly the same each time. You had to read 5 words three or four times? Seriously? >> I tried: >> >> for nm,street,city,state,zipcd in zip(nms,street,city,state,zipcd): >> >> but felt it was too long and wordy. > > It's long and wordy because you're doing something long and wordy. It is > *inherently* long and wordy to process five things, whether you write it > as: > > for i in range(len(names)): > name = names[i] > street = streets[i] > city = cities[i] > state = states[i] > zipcode = zipcodes[i] > process(...) > > or as: > > for name, street, city, state, zipcode in zip( > names, streets, cities, states, zipcodes > ): > process(...) I like mine best of all: ziplists = zip(names,streets,cities,states,zipcodes) for name,street,city,state,zipcode in ziplists: >> I like the first one better. python is awesome, but too many options >> for doing the same thing also makes it difficult. For me, anyway. > > > That's the difference between a master and an apprentice. The apprentice > likes to follow fixed steps the same way each time. The master craftsman > knows her tools backwards, and can choose the right tool for the job, and > when the choice of tool really doesn't matter and you can use whatever > happens to be the closest to hand. "her tools"... you're a woman?
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-09 13:09 +1000 |
| Message-ID | <572fff51$0$1586$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108384 |
On Mon, 9 May 2016 07:04 am, DFS wrote:
> On 5/8/2016 11:51 AM, Steven D'Aprano wrote:
>> On Mon, 9 May 2016 12:25 am, DFS wrote:
>>
>>>>> for j in range(len(nms)):
>>>>> cSQL = "INSERT INTO ADDRESSES VALUES (?,?,?,?,?)"
>>>>> vals = nms[j],street[j],city[j],state[j],zipcd[j]
>>
>> Why are you assigning cSQL to the same string over and over again?
>
> I like it in cloxe proximity to the vals statement.
The line immediately above the loop is in close proximity. Is that not close
enough?
>> Sure, assignments are cheap, but they're not infinitely cheap. They still
>> have a cost. Instead of paying that cost once, you pay it over and over
>> again, which adds up.
>
> Adds up to what?
Potentially a significant waste of time. You know what they say about
financial waste: "a million here, a million there, and soon we're talking
about real money".
The first point is that this is a micro-pessimisation: even though it
doesn't cost much individually, its still a needless expense that your
program keeps paying. Its like friction on your code.
Python is neither the fastest nor the slowest language available. It is
often "fast enough", but it is also an easy language to write slow code in.
If you were programming in C, the compiler would almost surely see that the
assignment was to a constant, and automatically and silently hoist it
outside the loop. That's one reason why C is so fast: it aggressively
optimizes your code. (Too aggressively, in my opinion, but that's another
story.) But the Python compiler isn't that sophisticated. It's up to us,
the programmers, to be mindful of the friction we add to our code, because
if we aren't mindful of it, we can easily end up with needlessly slow code.
But as I said, that's not the major problem with doing the assignment in the
loop. It's more about readability and your reader's expectations than the
extra time it costs.
>> Worse, it is misleading. I had to read that code snippet three or four
>> times before I realised that cSQL was exactly the same each time.
>
> You had to read 5 words three or four times? Seriously?
Yes, seriously, because when most people read code they skim it, speed
reading, looking for salient points of interest. They don't point their
finger under each word and read it aloud syllable by syllable like a
pre-schooler with reading difficulties. (At least I don't, I can't speak
for others.) So the first couple of times I glanced at it, it just looked
like any other assignment without the details registering.
Then the next couple of times I thought that it must be *me* making the
mistake, I must be reading it wrong. Maybe there's something I missed? I
read code with the default assumption that it is more or less sensible.
Over the various posts and replies to posts, I had probably glanced at that
line a dozen times, and then read it more carefully three or four times,
before I was sure I had read what I thought I had read.
[...]
> I like mine best of all:
>
> ziplists = zip(names,streets,cities,states,zipcodes)
> for name,street,city,state,zipcode in ziplists:
Using a temporary, single-use variable is okay, but it's often unnecessary.
But be aware that there is a particular risk with loops. You may be tempted
to think you can re-use ziplists to iterate over it twice:
data = zip(names, streets, cities, states, zipcodes)
for a, b, c in data:
do_stuff()
...
# later
for x, y, z in data:
do_something_else()
and that's perfectly fine, *but* there is a risk that if the for loop data
being iterated over is an iterator, it will have been exhausted by the
first loop and the second loop won't run at all.
In Python 2, zip() returns a list, but in Python 3, it returns an iterator.
So beware of using such temp variables unless you know what you're doing.
>>> I like the first one better. python is awesome, but too many options
>>> for doing the same thing also makes it difficult. For me, anyway.
>>
>>
>> That's the difference between a master and an apprentice. The apprentice
>> likes to follow fixed steps the same way each time. The master craftsman
>> knows her tools backwards, and can choose the right tool for the job, and
>> when the choice of tool really doesn't matter and you can use whatever
>> happens to be the closest to hand.
>
> "her tools"... you're a woman?
What makes you think I was talking about myself? I was talking about people
in general -- when we are beginners, its only natural that (like most
beginners) we're more comfortable with a limited amount of choice. Its hard
to remember what option to use when there's only one, let alone when
there's ten. But as we progress to mastery of the language, we'll come to
understand the subtle differences between options, when they matter, and
when they don't.
If I had said "his tools", would you have thought I was talking specifically
about a man?
Does it matter if I'm a woman? Would that make my advice better or worse?
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-05-08 03:21 +0100 |
| Message-ID | <mailman.492.1462674102.32212.python-list@python.org> |
| In reply to | #108311 |
On 2016-05-08 03:14, Stephen Hansen wrote: > On Sat, May 7, 2016, at 06:16 PM, DFS wrote: > >> Why is it better to zip() them up and use: >> >> for item1, item2, item3 in zip(list1, list2, list3): >> do something with the items >> >> than >> >> for j in range(len(list1)): >> do something with list1[j], list2[j], list3[j], etc. > > Although Chris has a perfectly good and valid answer why conceptually > the zip is better, let me put forth: the zip is simply clearer, more > readable and more maintainable. > > This is a question of style and to a certain degree aesthetics, so is > somewhat subjective, but range(len(list1)) and list1[j] are all > indirection, when item1 is clearly (if given a better name then 'item1') > something distinct you're working on. > +1 If you're iterating through multiple sequences in parallel, zip is the way to go.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-08 21:36 +1000 |
| Message-ID | <572f24bc$0$1618$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108311 |
On Sun, 8 May 2016 11:16 am, DFS wrote:
> address data is scraped from a website:
>
> names = tree.xpath()
> addr = tree.xpath()
Why are you scraping the data twice?
names = addr = tree.xpath()
or if you prefer the old-fashioned:
names = tree.xpath()
addr = names
but that raises the question, how can you describe the same set of data as
both "names" and "addr[esses]" and have them both be accurate?
> I want to store the data atomically,
I'm not really sure what you mean by "atomically" here. I know what *I* mean
by "atomically", which is to describe an operation which either succeeds
entirely or fails. But I don't know what you mean by it.
> so I parse street, city, state, and
> zip into their own lists.
None of which is atomic.
> "1250 Peachtree Rd, Atlanta, GA 30303
>
> street = [s.split(',')[0] for s in addr]
> city = [c.split(',')[1].strip() for c in addr]
> state = [s[-8:][:2] for s in addr]
> zipcd = [z[-5:] for z in addr]
At this point, instead of iterating over the same list four times, doing the
same thing over and over again, you should do things the old-fashioned way:
streets, cities, states, zipcodes = [], [], [], []
for word in addr:
items = word.split(',')
streets.append(items[0])
cities.append(items[1].strip())
states.append(word[-8:-2])
zipcodes.append(word[-5:])
Oh, and use better names. "street" is a single street, not a list of
streets, note plural.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-08 17:24 -0400 |
| Message-ID | <ngoajd$kp6$1@dont-email.me> |
| In reply to | #108347 |
On 5/8/2016 7:36 AM, Steven D'Aprano wrote:
> On Sun, 8 May 2016 11:16 am, DFS wrote:
>
>> address data is scraped from a website:
>>
>> names = tree.xpath()
>> addr = tree.xpath()
>
> Why are you scraping the data twice?
Because it exists in 2 different sections of the document.
names = tree.xpath('//span[@class="header_text3"]/text()')
addresses = tree.xpath('//span[@class="text3"]/text()')
I thought you were a "master who knew her tools", and I was the
apprentice?
So why did "the master" think xpath() was magic?
> names = addr = tree.xpath()
>
> or if you prefer the old-fashioned:
>
> names = tree.xpath()
> addr = names
>
> but that raises the question, how can you describe the same set of data as
> both "names" and "addr[esses]" and have them both be accurate?
>
>
>> I want to store the data atomically,
>
> I'm not really sure what you mean by "atomically" here. I know what *I* mean
> by "atomically", which is to describe an operation which either succeeds
> entirely or fails.
That's atomicity.
> But I don't know what you mean by it.
http://www.databasedesign-resource.com/atomic-database-values.html
>> so I parse street, city, state, and
>> zip into their own lists.
>
> None of which is atomic.
All of which are atomic.
>> "1250 Peachtree Rd, Atlanta, GA 30303
>>
>> street = [s.split(',')[0] for s in addr]
>> city = [c.split(',')[1].strip() for c in addr]
>> state = [s[-8:][:2] for s in addr]
>> zipcd = [z[-5:] for z in addr]
>
> At this point, instead of iterating over the same list four times, doing the
> same thing over and over again, you should do things the old-fashioned way:
>
> streets, cities, states, zipcodes = [], [], [], []
> for word in addr:
> items = word.split(',')
> streets.append(items[0])
> cities.append(items[1].strip())
> states.append(word[-8:-2])
> zipcodes.append(word[-5:])
That's a good one.
Chris Angelico mentioned something like that, too, and I already put it
place.
> Oh, and use better names. "street" is a single street, not a list of
> streets, note plural.
I'll use whatever names I like.
[toc] | [prev] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2016-05-08 17:39 -0400 |
| Message-ID | <mailman.531.1462743587.32212.python-list@python.org> |
| In reply to | #108388 |
On Sun, May 8, 2016 at 5:24 PM, DFS <nospam@dfs.com> wrote:
> On 5/8/2016 7:36 AM, Steven D'Aprano wrote:
>>
>> On Sun, 8 May 2016 11:16 am, DFS wrote:
>>
>>> address data is scraped from a website:
>>>
>>> names = tree.xpath()
>>> addr = tree.xpath()
>>
>>
>> Why are you scraping the data twice?
>
>
>
> Because it exists in 2 different sections of the document.
>
> names = tree.xpath('//span[@class="header_text3"]/text()')
> addresses = tree.xpath('//span[@class="text3"]/text()')
>
>
> I thought you were a "master who knew her tools", and I was the apprentice?
>
> So why did "the master" think xpath() was magic?
>
>
>
>
>
>
>> names = addr = tree.xpath()
>>
>> or if you prefer the old-fashioned:
>>
>> names = tree.xpath()
>> addr = names
>>
>> but that raises the question, how can you describe the same set of data as
>> both "names" and "addr[esses]" and have them both be accurate?
>>
>>
>>> I want to store the data atomically,
>>
>>
>> I'm not really sure what you mean by "atomically" here. I know what *I*
>> mean
>> by "atomically", which is to describe an operation which either succeeds
>> entirely or fails.
>
>
> That's atomicity.
>
>
>
>> But I don't know what you mean by it.
>
> http://www.databasedesign-resource.com/atomic-database-values.html
>
>
>
>>> so I parse street, city, state, and
>>> zip into their own lists.
>>
>>
>> None of which is atomic.
>
>
> All of which are atomic.
>
>
>
>>> "1250 Peachtree Rd, Atlanta, GA 30303
>>>
>>> street = [s.split(',')[0] for s in addr]
>>> city = [c.split(',')[1].strip() for c in addr]
>>> state = [s[-8:][:2] for s in addr]
>>> zipcd = [z[-5:] for z in addr]
>>
>>
>> At this point, instead of iterating over the same list four times, doing
>> the
>> same thing over and over again, you should do things the old-fashioned
>> way:
>>
>> streets, cities, states, zipcodes = [], [], [], []
>> for word in addr:
>> items = word.split(',')
>> streets.append(items[0])
>> cities.append(items[1].strip())
>> states.append(word[-8:-2])
>> zipcodes.append(word[-5:])
>
>
>
>
> That's a good one.
>
> Chris Angelico mentioned something like that, too, and I already put it
> place.
>
>
>
>> Oh, and use better names. "street" is a single street, not a list of
>> streets, note plural.
>
>
>
> I'll use whatever names I like.
>
>
>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
Starting to look like trolling. Lots of good advice here. If you
ask, and don't like the advice, don't use it.
--
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-09 13:46 +1000 |
| Message-ID | <57300813$0$1620$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108388 |
On Mon, 9 May 2016 07:24 am, DFS wrote:
> On 5/8/2016 7:36 AM, Steven D'Aprano wrote:
>> On Sun, 8 May 2016 11:16 am, DFS wrote:
>>
>>> address data is scraped from a website:
>>>
>>> names = tree.xpath()
>>> addr = tree.xpath()
>>
>> Why are you scraping the data twice?
>
>
> Because it exists in 2 different sections of the document.
>
> names = tree.xpath('//span[@class="header_text3"]/text()')
> addresses = tree.xpath('//span[@class="text3"]/text()')
How was I supposed to know that you were providing two different arguments
to the method? It looked like a pure-function call, which should always
return the same thing. Communication errors are on the sender, not the
receiver.
You didn't say what tree was, so I judged that xpath was some argument-less
method of tree that returned some attribute, sufficiently cleaned up.
It would be more obvious to pass a placeholder argument:
names = tree.xpath(this)
addr = tree.xpath(that)
>>> I want to store the data atomically,
>>
>> I'm not really sure what you mean by "atomically" here. I know what *I*
>> mean by "atomically", which is to describe an operation which either
>> succeeds entirely or fails.
>
> That's atomicity.
Right. And that doesn't apply to the portion of your code we're discussing.
There's no storage involved. The list comps do either succeed entirely or
fail, until you actually get to writing to the database, there's nothing
that "store the data atomically" would apply to that I saw.
[...]
>>> so I parse street, city, state, and
>>> zip into their own lists.
>>
>> None of which is atomic.
>
> All of which are atomic.
Sorry, my poor choice of words. None of which are atomic *storage*.
>> Oh, and use better names. "street" is a single street, not a list of
>> streets, note plural.
>
> I'll use whatever names I like.
*shrug*
Its your code, you can name all your variables after "Mr. Meeseeks" if you
want.
for meeseeks, Meeseeks, meeSeeks, MEEseeks in zip(
MESEEKS, meeeseeks, MeesEeks, MEEsEEkS):
mEEsEeKSS.append(meeSeeks)
...
Just don't ask others to read it.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Michael Selik <michael.selik@gmail.com> |
|---|---|
| Date | 2016-05-07 18:42 +0000 |
| Message-ID | <mailman.459.1462646572.32212.python-list@python.org> |
| In reply to | #108275 |
On Sat, May 7, 2016 at 12:56 PM DFS <nospam@dfs.com> wrote: > |mixed-indentation |186 | I always use tab > Don't mix tabs and spaces. I suggest selecting all lines and using your editor to convert spaces to tabs. Usually there's a feature to "tabify". > +-------------------------+------------+ > |invalid-name |82 | every single variable name?! > Class names should be CamelCase Everything else should be lowercase_with_underscores > +-------------------------+------------+ > |bad-whitespace |65 | mostly because I line up = > signs: > var1 = value > var10 = value > Sure, that's your style. But pylint likes a different style. It's good to use a standard. If it's just you, I suggest conforming to pylint. If you're already on a team, use your team's standard. +-------------------------+------------+ > |trailing-whitespace |59 | heh! > Get rid of it. Save some bytes. > +-------------------------+------------+ > |multiple-statements |23 | do this to save lines. > Will continue doing it. > If you want to share your code with others, you should conform to community standards to make things easier for others to read. Further, if you think the core contributors are expert programmers, you should probably take their advice: "sparse is better than dense". Do your future-self a favor and write one statement per line. Today you find it easy to read. Six months from now you won't. > +-------------------------+------------+ > |no-member |5 | > > "Module 'pyodbc' has no 'connect' member" Yes it does. > "Module 'pyodbc' has no 'Error' member" Yes it does. > > Issue with pylint, or pyodbc? > Not sure. Maybe pyodbc is written in a way that pylint can't see it's connect or Error method/attribute. > +-------------------------+------------+ > |line-too-long |5 | meh > Yeah, I think 80 characters can be somewhat tight. Still, 5 long lines in 200ish lines of code? Sounds like you might be doing too much in those lines or have too many levels of indentation. "Sparse is better than dense" "Flat is better than nested" > +-------------------------+------------+ > |wrong-import-order |4 | does it matter? > No. I think pylint likes to alphabetize. With only 4 imports, it doesn't matter. Still, why not alphabetize? > +-------------------------+------------+ > |missing-docstring |4 | what's the difference between > a docstring and a # comment? > Docstrings are tools for introspection. Many things in Python access the __doc__ attribute to help you. Comments are never seen by module users. > +-------------------------+------------+ > |superfluous-parens |3 | I like to surround 'or' > statments with parens > Ok. But over time you'll get used to not needing them. Edward Tufte says you should have a high "information-to-ink" ratio. > +-------------------------+------------+ > |redefined-outer-name |3 | fixed. changed local var names. > +-------------------------+------------+ > |redefined-builtin |2 | fixed. Was using 'zip' and 'id' > +-------------------------+------------+ > |multiple-imports |2 | doesn't everyone? > Yeah, I do that as well. > +-------------------------+------------+ > |consider-using-enumerate |2 | see below [1] > As Chris explained. > +-------------------------+------------+ > |bad-builtin |2 | warning because I used filter? > I think pylint likes comprehensions better. IMHO filter is OK. If you're using a lambda, change to a comprehension. > +-------------------------+------------+ > |unused-import |1 | fixed > +-------------------------+------------+ > |unnecessary-pass |1 | fixed. left over from > Try..Except > +-------------------------+------------+ > |missing-final-newline |1 | I'm using Notepad++, with > EOL Conversion set to > 'Windows Format'. How > or should I fix this? > Add a few blank lines to the end of your file. > +-------------------------+------------+ > |fixme |1 | a TODO statement > +-------------------------+------------+ > > Global evaluation > ----------------- > Your code has been rated at -7.64/10 > > > > I assume -7.64 is really bad? > > Has anyone ever in history gotten 10/10 from pylint for a non-trivial > program? > I'm certain of it.
[toc] | [prev] | [next] | [standalone]
| From | Peter Pearson <pkpearson@nowhere.invalid> |
|---|---|
| Date | 2016-05-07 18:43 +0000 |
| Message-ID | <dp6ravFfuq0U1@mid.individual.net> |
| In reply to | #108275 |
On Sat, 7 May 2016 12:51:00 -0400, DFS <nospam@dfs.com> wrote: > This more-anal-than-me program generated almost 2 warnings for every > line of code in my program. w t hey? Thank you for putting a sample of pylint output in front of my eyes; you inspired me to install pylint and try it out. If it teaches me even half as much as it's teaching you, I'll consider it a great blessing. -- To email me, substitute nowhere->runbox, invalid->com.
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-08 17:05 -0400 |
| Message-ID | <ngo9g8$ggs$2@dont-email.me> |
| In reply to | #108280 |
On 5/7/2016 2:43 PM, Peter Pearson wrote: > On Sat, 7 May 2016 12:51:00 -0400, DFS <nospam@dfs.com> wrote: >> This more-anal-than-me program generated almost 2 warnings for every >> line of code in my program. w t hey? > > Thank you for putting a sample of pylint output in front of my eyes; > you inspired me to install pylint and try it out. If it teaches me even > half as much as it's teaching you, I'll consider it a great blessing. Cool. I don't agree with some of them, but there's no doubt adhering to them will result in more well-formed code.
[toc] | [prev] | [next] | [standalone]
| From | Christopher Reimer <christopher_reimer@icloud.com> |
|---|---|
| Date | 2016-05-07 11:52 -0700 |
| Message-ID | <mailman.461.1462647153.32212.python-list@python.org> |
| In reply to | #108275 |
On 5/7/2016 9:51 AM, DFS wrote: > Has anyone ever in history gotten 10/10 from pylint for a non-trivial > program? I routinely get 10/10 for my code. While pylint isn't perfect and idiosyncratic at times, it's a useful tool to help break bad programming habits. Since I came from a Java background, I had to unlearn everything from Java before I could write Pythonic code. It might help to use an IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE). > That's about as good as it's gonna get! You can do better. You should strive for 10/10 whenever possible, figure out why you fall short and ask for help on the parts that don't make sense. > pylint says "Consider using enumerate instead of iterating with range > and len" > > the offending code is: > for j in range(len(list1)): > do something with list1[j], list2[j], list3[j], etc. This code is reeking with bad habits to be broken. Assigning a throwaway variable to walk the index is unnecessary when Python can do it for you behind the scenes. As Chris A. pointed out in his post, you should use zip() to walk through the values of each list at the same time. Thank you, Chris R.
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-07 23:38 -0400 |
| Message-ID | <ngmc53$ous$1@dont-email.me> |
| In reply to | #108282 |
On 5/7/2016 2:52 PM, Christopher Reimer wrote: > On 5/7/2016 9:51 AM, DFS wrote: >> Has anyone ever in history gotten 10/10 from pylint for a non-trivial >> program? > > I routinely get 10/10 for my code. While pylint isn't perfect and > idiosyncratic at times, it's a useful tool to help break bad programming > habits. Since I came from a Java background, I had to unlearn everything > from Java before I could write Pythonic code. It might help to use an > IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE). > >> That's about as good as it's gonna get! > > You can do better. 10/10 on pylint isn't better. It's being robotic and conforming to the opinions of the author of that app. In fact, I think: import os, sys, time, socket is much more readable than, and preferable to, import os import sys import time import socket but pylint complains about the former. > You should strive for 10/10 whenever possible, nah > figure out why you fall short and ask for help on the parts that don't > make sense. I actually agree with ~3/4 of the suggestions it makes. My code ran fine before pylint tore it a new one, and it doesn't appear to run any better after making various fixes. But between you clp guys and pylint, the code is definitely improving. >> pylint says "Consider using enumerate instead of iterating with range >> and len" >> >> the offending code is: >> for j in range(len(list1)): >> do something with list1[j], list2[j], list3[j], etc. > > This code is reeking with bad habits to be broken. Assigning a throwaway > variable to walk the index is unnecessary when Python can do it for you > behind the scenes. Don't you think python also allocates a throwaway variable for use with zip and enumerate()? > As Chris A. pointed out in his post, you should use > zip() to walk through the values of each list at the same time. Yeah, zip looks interesting. I just started using python a month ago, and didn't know about zip until pylint pointed it out (it said I redefined a builtin by using 'zip' as a list name). Edit: I already put zip() it in place. Only improvement I think is it looks cleaner - got rid of a bunch of [j]s. > Thank you, > > Chris R. No, thank /you/, DFS
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-05-08 13:56 +1000 |
| Message-ID | <mailman.501.1462679822.32212.python-list@python.org> |
| In reply to | #108328 |
On Sun, May 8, 2016 at 1:38 PM, DFS <nospam@dfs.com> wrote: >> This code is reeking with bad habits to be broken. Assigning a throwaway >> variable to walk the index is unnecessary when Python can do it for you >> behind the scenes. > > > Don't you think python also allocates a throwaway variable for use with zip > and enumerate()? Nope. But even if it did, it wouldn't matter. Concern yourself with your code, and let the implementation take care of itself. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-05-08 16:19 +0200 |
| Message-ID | <mailman.513.1462717365.32212.python-list@python.org> |
| In reply to | #108328 |
DFS wrote: > On 5/7/2016 2:52 PM, Christopher Reimer wrote: >> On 5/7/2016 9:51 AM, DFS wrote: >>> Has anyone ever in history gotten 10/10 from pylint for a non-trivial >>> program? >> >> I routinely get 10/10 for my code. While pylint isn't perfect and >> idiosyncratic at times, it's a useful tool to help break bad programming >> habits. Since I came from a Java background, I had to unlearn everything >> from Java before I could write Pythonic code. It might help to use an >> IDE that offers PEP8-compliant code suggestions (I use PyCharm IDE). >> >>> That's about as good as it's gonna get! >> >> You can do better. > > 10/10 on pylint isn't better. Not always, but where you and pylint disagree I'm more likely to side with the tool ;) > It's being robotic and conforming to the > opinions of the author of that app. The problem are the tool's limitations, the "being robotic" rather than following someone else's opinions. > In fact, I think: > > import os, sys, time, socket > > is much more readable than, and preferable to, > > import os > import sys > import time > import socket > > but pylint complains about the former. Do you use version control? >> You should strive for 10/10 whenever possible, > > nah > > >> figure out why you fall short and ask for help on the parts that don't >> make sense. > > I actually agree with ~3/4 of the suggestions it makes. My code ran > fine before pylint tore it a new one, and it doesn't appear to run any > better after making various fixes. Do you write unit tests?
[toc] | [prev] | [next] | [standalone]
| From | Stephen Hansen <me+python@ixokai.io> |
|---|---|
| Date | 2016-05-07 12:21 -0700 |
| Message-ID | <mailman.463.1462648882.32212.python-list@python.org> |
| In reply to | #108275 |
Pylint is very opinionated. Feel free to adjust its configuration to suit your opinions of style. In particular, several of these might be related to PEP8 style issues. On Sat, May 7, 2016, at 09:51 AM, DFS wrote: > DFS comments > +-------------------------+------------+ ------------------------------- > |message id |occurrences | > +=========================+============+ > |mixed-indentation |186 | I always use tab And yet, it appears there's some space indentation in there. In Notepad++ enable View->Show Symbol->Show White Space and Tab and Show Indent Guide. > +-------------------------+------------+ > |invalid-name |82 | every single variable name?! It probably defaults to PEP8 names, which are variables_like_this, not variablesLikeThis. > +-------------------------+------------+ > |bad-whitespace |65 | mostly because I line up = > signs: > var1 = value > var10 = value Yeah and PEP8 says don't do that. Adjust the configuration of pylint if you want. > +-------------------------+------------+ > |multiple-statements |23 | do this to save lines. > Will continue doing it. This you really shouldn't do, imho. Saving lines is not a virtue, readability is -- dense code is by definition less readable. > +-------------------------+------------+ > |no-member |5 | > > "Module 'pyodbc' has no 'connect' member" Yes it does. > "Module 'pyodbc' has no 'Error' member" Yes it does. > > Issue with pylint, or pyodbc? Pylint. > +-------------------------+------------+ > |line-too-long |5 | meh I'm largely meh on this too. But again its a PEP8 thing. > +-------------------------+------------+ > |wrong-import-order |4 | does it matter? Its useful to have a standard so you can glance and tell what's what and from where, but what that standard is, is debatable. > +-------------------------+------------+ > |missing-docstring |4 | what's the difference between > a docstring and a # comment? A docstring is a docstring, a comment is a comment. Google python docstrings :) Python prefers files to have a docstring on top, and functions beneath their definition. Comments should be used as little as possible, as they must be maintained: an incorrect comment is worse then no comment. Go for clear code that doesn't *need* commenting. > +-------------------------+------------+ > |superfluous-parens |3 | I like to surround 'or' > statments with parens Why? > +-------------------------+------------+ > |multiple-imports |2 | doesn't everyone? I don't actually know what its complaining at. > +-------------------------+------------+ > |bad-builtin |2 | warning because I used filter? Don't know what its complaining at about here either. > +-------------------------+------------+ > |missing-final-newline |1 | I'm using Notepad++, with > EOL Conversion set to > 'Windows Format'. How > or should I fix this? Doesn't have anything to do with it. Just scroll to the bottom and press enter. It wants to end on a newline, not code. > Global evaluation > ----------------- > Your code has been rated at -7.64/10 > > I assume -7.64 is really bad? > > Has anyone ever in history gotten 10/10 from pylint for a non-trivial > program? No clue, I don't use pylint at all. > [1] > pylint says "Consider using enumerate instead of iterating with range > and len" > > the offending code is: > for j in range(len(list1)): > do something with list1[j], list2[j], list3[j], etc. > > enumeration would be: > for j,item in enumerate(list1): > do something with list1[j], list2[j], list3[j], etc. > > Is there an advantage to using enumerate() here? Its cleaner, easier to read. In Python 2 where range() returns a list, its faster. (In python2, xrange returns a lazy evaluating range) Use the tools Python gives you. Why reinvent enumerate when its built in? -- Stephen Hansen m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
| From | Stephen Hansen <me@ixokai.io> |
|---|---|
| Date | 2016-05-07 12:23 -0700 |
| Message-ID | <mailman.465.1462649002.32212.python-list@python.org> |
| In reply to | #108275 |
On Sat, May 7, 2016, at 11:52 AM, Christopher Reimer wrote: > You can do better. You should strive for 10/10 whenever possible, > figure out why you fall short and ask for help on the parts that don't > make sense. I think this is giving far too much weight to pylint's opinion on what is "good" or "bad" programming habits. -- Stephen Hansen m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →
Back to top | Article view | comp.lang.python
csiph-web