Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33473 > unrolled thread

Re: Python Interview Questions

Started bychinjannisha@gmail.com
First post2012-11-17 10:01 -0800
Last post2012-11-19 07:02 +1100
Articles 20 on this page of 24 — 10 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Python Interview Questions chinjannisha@gmail.com - 2012-11-17 10:01 -0800
    Re: Python Interview Questions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:54 -0500
    Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-18 09:39 +0000
      Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-18 08:53 -0500
        Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-18 16:50 +0000
          Re: Python Interview Questions "D'Arcy J.M. Cain" <darcy@druid.net> - 2012-11-18 12:16 -0500
          Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-18 12:53 -0500
            Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-19 00:31 +0000
              Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-18 21:09 -0500
                Re: Python Interview Questions Chris Angelico <rosuav@gmail.com> - 2012-11-19 13:18 +1100
                Re: Python Interview Questions Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-11-19 02:42 +0000
                Re: Python Interview Questions Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-18 23:01 -0700
                Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-19 07:54 +0000
                  Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-19 09:30 -0500
                    Re: Python Interview Questions Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-19 09:44 -0700
                    Re: Python Interview Questions Terry Reedy <tjreedy@udel.edu> - 2012-11-19 15:41 -0500
                    Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-19 23:42 +0000
                      Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-19 21:33 -0500
                  Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-19 09:59 -0500
                    Re: Python Interview Questions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-19 23:53 +0000
                      Re: Python Interview Questions Roy Smith <roy@panix.com> - 2012-11-19 22:14 -0500
                    RE: Python Interview Questions "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-19 23:57 +0000
                Re: Python Interview Questions Terry Reedy <tjreedy@udel.edu> - 2012-11-19 03:27 -0500
          Re: Python Interview Questions Chris Angelico <rosuav@gmail.com> - 2012-11-19 07:02 +1100

Page 1 of 2  [1] 2  Next page →


#33473 — Re: Python Interview Questions

Fromchinjannisha@gmail.com
Date2012-11-17 10:01 -0800
SubjectRe: Python Interview Questions
Message-ID<55443eb7-847c-4f4c-8d04-1e6b507aac00@googlegroups.com>
Hi I had one doubt.. I know very little bit of python .I wanted to know when to use list,tuple,dictionary and set? Please reply me asap

thanks

[toc] | [next] | [standalone]


#33493

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-11-18 01:54 -0500
Message-ID<mailman.3786.1353221697.27098.python-list@python.org>
In reply to#33473
On Sat, 17 Nov 2012 10:01:01 -0800 (PST), chinjannisha@gmail.com
declaimed the following in gmane.comp.python.general:

> Hi I had one doubt.. I know very little bit of python .I wanted to know when to use list,tuple,dictionary and set? Please reply me asap
>
	They are used when they are appropriate to the algorithm being
coded.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#33495

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-18 09:39 +0000
Message-ID<50a8acdc$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#33473
On Sat, 17 Nov 2012 10:01:01 -0800, chinjannisha wrote:

> Hi I had one doubt.. I know very little bit of python .I wanted to know
> when to use list,tuple,dictionary and set? Please reply me asap

Use a list when you want a list of items that should all be treated the 
same way:

list_of_numbers = [1, 3, 5.1, 2, 6.5]

total = sum(list_of_numbers)


or when you need a collection of items where the order they are in is 
important:

winners = ['george', 'susan', 'henry']  # 1st, 2nd, 3rd place

print('The winner is:', winners[0])


Use a tuple when you want a collection of items that mean different 
things, a bit like a C struct or Pascal record:

a = (23, "henry", True, 'engineer')
b = (42, "natalie", False, 'manager')
c = (17, "george", True, 'student')


Use a dict when you need a collection of key:value mappings:

config = {'name': 'fred', 'pagesize': 'A4', 'verbose': True, 'size': 18}
address = {'fred': 'fred@example.com', 'sally': 'sally_smith@example.com'}

if config['verbose']:
    print('sending email...')
send_email_to(address['fred'], "Subject: Hello")


Use a set when you want to represent a collection of items and the order 
is not important:

failed = {'fred', 'tom', 'sally'}  # set literal syntax in Python 3 only
# use set(['fred', 'tom', 'sally']) in Python 2

if 'george' in failed:
    print('George, you have failed!')



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#33499

FromRoy Smith <roy@panix.com>
Date2012-11-18 08:53 -0500
Message-ID<roy-EFE1F1.08532518112012@news.panix.com>
In reply to#33495
In article <50a8acdc$0$29978$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> Use a list when you want a list of items that should all be treated the 
> same way [...] or when you need a collection of items where the order they are in is 
> important:
>
> Use a tuple when you want a collection of items that mean different 
> things, a bit like a C struct or Pascal record:

That is certainly the right answer according to the One True Church Of 
Pythonic Orthodoxy And Theoretical Correctness.  But, let me give an 
alternative answer which works for The Unwashed Masses Who Live In The 
Trenches And Write Python Code For A Living:

Use a list when you need an ordered collection which is mutable (i.e. 
can be altered after being created).  Use a tuple when you need an 
immutable list (such as for a dictionary key).

[toc] | [prev] | [next] | [standalone]


#33507

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-18 16:50 +0000
Message-ID<50a911ec$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#33499
On Sun, 18 Nov 2012 08:53:25 -0500, Roy Smith wrote:

> In article <50a8acdc$0$29978$c3e8da3$5496439d@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> 
>> Use a list when you want a list of items that should all be treated the
>> same way [...] or when you need a collection of items where the order
>> they are in is important:
>>
>> Use a tuple when you want a collection of items that mean different
>> things, a bit like a C struct or Pascal record:
> 
> That is certainly the right answer according to the One True Church Of
> Pythonic Orthodoxy And Theoretical Correctness.

Oh I'm sorry, did something I say suggest that the couple of examples I 
gave are the *only* acceptable uses? My apologies for not giving an 
exhaustive list of every possible use of lists and tuples, I'll be sure 
to punish myself severely for the lapse.


> But, let me give an
> alternative answer which works for The Unwashed Masses Who Live In The
> Trenches And Write Python Code For A Living:
> 
> Use a list when you need an ordered collection which is mutable (i.e.
> can be altered after being created).  Use a tuple when you need an
> immutable list (such as for a dictionary key).

I keep hearing about this last one, but I wonder... who *actually* does 
this? I've created many, many lists over the years -- lists of names, 
lists of phone numbers, lists of directory search paths, all sorts of 
things. I've never needed to use one as a dictionary key.

Under what sort of circumstances would somebody want to take a mutable 
list of data, say a list of email addresses, freeze it into a known 
state, and use that frozen state as a key in a dict? What would be the 
point? Even if there was some meaningful reason to look up "this list of 
12000 email addresses" as a single key, it is going to get out of sync 
with the actual mutable list.

Sure, I have built a collection of items as a list, because lists are 
mutable, then frozen it into a tuple, and *thrown the list away*, then 
used the tuple as a key. But that's not the same thing, the intent is 
different. In my case, the data was never intended to be a list, it was 
always intended to be a fixed record-like collection, the use of list was 
as a temporary data structure used for construction. A bit like the idiom 
of ''.join(some_list).

But I can't think of any meaningful, non-contrived example where I might 
want an actual mutable list of values as a dict key.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#33508

From"D'Arcy J.M. Cain" <darcy@druid.net>
Date2012-11-18 12:16 -0500
Message-ID<mailman.3796.1353259507.27098.python-list@python.org>
In reply to#33507
On 18 Nov 2012 16:50:52 GMT
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> On Sun, 18 Nov 2012 08:53:25 -0500, Roy Smith wrote:
>> > Use a list when you need an ordered collection which is mutable
> > (i.e. can be altered after being created).  Use a tuple when you
> > need an immutable list (such as for a dictionary key).
> 
> I keep hearing about this last one, but I wonder... who *actually*
> does this? I've created many, many lists over the years -- lists of
> names, lists of phone numbers, lists of directory search paths, all
> sorts of things. I've never needed to use one as a dictionary key.

Well, as long as *you* never needed it then...

CellBlock = 9 # There's a riot going on...
Cell = 17
Bunk = "top"

Prisoner = {(CellBlock, Cell, Bunk): "Bernie Madoff"}

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
IM: darcy@Vex.Net

[toc] | [prev] | [next] | [standalone]


#33509

FromRoy Smith <roy@panix.com>
Date2012-11-18 12:53 -0500
Message-ID<roy-B2D5FF.12535018112012@news.panix.com>
In reply to#33507
In article <50a911ec$0$29978$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> Oh I'm sorry, did something I say suggest that the couple of examples I 
> gave are the *only* acceptable uses? My apologies for not giving an 
> exhaustive list of every possible use of lists and tuples, I'll be sure 
> to punish myself severely for the lapse.

Hmmm.  I didn't mean any offense.  I was just pointing out that what's 
true in theory and what's true in practice aren't always the same.

> Under what sort of circumstances would somebody want to take a mutable 
> list of data, say a list of email addresses, freeze it into a known 
> state, and use that frozen state as a key in a dict?

I've got a script which trolls our log files looking for python stack 
dumps.  For each dump it finds, it computes a signature (basically, a 
call sequence which led to the exception) and uses this signature as a 
dictionary key.  Here's the relevant code (abstracted slightly for 
readability):

def main(args):
    crashes = {}
    [...]
    for line in open(log_file):
        if does_not_look_like_a_stack_dump(line):
             continue
        lines = traceback_helper.unfold(line)
        header, stack = traceback_helper.extract_stack(lines)
        signature = tuple(stack)
        if signature in crashes:
            count, header = crashes[signature]
            crashes[signature] = (count + 1, header)
        else:
            crashes[signature] = (1, header)

You can find traceback_helper at 
https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/traceba
ck_helper.py

The stack that's returned is a list.  It's inherently a list, per the 
classic definition:

* It's variable length.  Different stacks have different depths.

* It's homogeneous.  There's nothing particularly significant about each 
entry other than it's the next one in the stack.

* It's mutable.  I can build it up one item at a time as I discover them.

* It's ordered.  f1(f2()) is not the same as f2(f1()).

But, to use it as a dictionary key, I need to make it into a tuple, 
since keys have to be immutable.

[toc] | [prev] | [next] | [standalone]


#33515

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-19 00:31 +0000
Message-ID<50a97de0$0$29983$c3e8da3$5496439d@news.astraweb.com>
In reply to#33509
On Sun, 18 Nov 2012 12:53:50 -0500, Roy Smith wrote:

> I've got a script which trolls our log files looking for python stack
> dumps.  For each dump it finds, it computes a signature (basically, a
> call sequence which led to the exception) and uses this signature as a
> dictionary key.  Here's the relevant code (abstracted slightly for
> readability):
> 
> def main(args):
>     crashes = {}
>     [...]
>     for line in open(log_file):
>         if does_not_look_like_a_stack_dump(line):
>              continue
>         lines = traceback_helper.unfold(line)
>         header, stack = traceback_helper.extract_stack(lines)
>         signature = tuple(stack)
>         if signature in crashes:
>             count, header = crashes[signature]
>             crashes[signature] = (count + 1, header)
>         else:
>             crashes[signature] = (1, header)
> 
> You can find traceback_helper at
> https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/
> traceback_helper.py
> 
> The stack that's returned is a list.  It's inherently a list, per the
> classic definition:

Er, no, it's inherently a blob of multiple text lines. Sure, you've built 
it a line at a time by using a list, but I've already covered that case. 
Once you've identified a stack, you never append to it, sort it, delete 
lines in the middle of it... none of these list operations are meaningful 
for a Python stack trace. The stack becomes a fixed string, and not just 
because you use it as a dict key, but because inherently it counts as a 
single, immutable blob of lines.

A tuple of individual lines is one reasonable data structure for a blob 
of lines. Another would be a single string:

    signature = '\n'.join(stack)

Depending on what you plan to do with the signatures, one or the other 
implementation might be better. I'm sure that there are other data 
structures as well.


> * It's variable length.  Different stacks have different depths.

Once complete, the stack trace is fixed length, but that fixed length is 
different from one stack to the next. Deleting a line would make it 
incomplete, and adding a line would make it invalid.


> * It's homogeneous.  There's nothing particularly significant about each
> entry other than it's the next one in the stack.
> 
> * It's mutable.  I can build it up one item at a time as I discover
> them.

The complete stack trace is inhomogeneous and immutable. I've already 
covered immutability above: removing, adding or moving lines will 
invalidate the stack trace. Inhomogeneity comes from the structure of a 
stack trace. The mere fact that each line is a string does not mean that 
any two lines are equivalent. Different lines represent different things.

Traceback (most recent call last):
  File "./prattle.py", line 873, in select
    selection = self.do_callback(cb, response)
  File "./prattle.py", line 787, in do_callback
    raise callback
ValueError: what do you mean?

is a valid stack. But:

Traceback (most recent call last):
    raise callback
    selection = self.do_callback(cb, response)
  File "./prattle.py", line 787, in do_callback
ValueError: what do you mean?
  File "./prattle.py", line 873, in select

is not. A stack trace has structure. The equivalent here is the 
difference between:

ages = [23, 42, 19, 67,  # age, age, age, age
        17, 94, 32, 51,  # ...
        ]

values = [23, 1972, 1, 34500,  # age, year, number of children, income
          35, 1985, 0, 67900,  # age, year, number of children, income
          ]

A stack trace is closer to the second example than the first: each item 
may be the same type, but the items don't represent the same *kind of 
thing*. 


You could make a stack trace homogeneous with a little work:

- drop the Traceback line and the final exception line;
- parse the File lines to extract the useful fields;
- combine them with the source code.

Now you have a blob of homogeneous records, here shown as lines of text 
with ! as field separator:

./prattle.py ! 873 ! select ! selection = self.do_callback(cb, response)
./prattle.py ! 787 ! do_callback ! raise callback

But there's really nothing you can do about the immutability. There isn't 
any meaningful reason why you might want to take a complete stack trace 
and add or delete lines from it.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#33522

FromRoy Smith <roy@panix.com>
Date2012-11-18 21:09 -0500
Message-ID<roy-BD53B0.21093618112012@news.panix.com>
In reply to#33515
In article <50a97de0$0$29983$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:


> > The stack that's returned is a list.  It's inherently a list, per the
> > classic definition:
> 
> Er, no, it's inherently a blob of multiple text lines.

No, it's a list that looks like (taken from the doc string of the code I 
referenced):

    [('/usr/lib/.../base.py', 'get_response'),
     ('/home/songza/.../views.py', 'song_info'),
     ('/home/songza.../api.py', 'get_song'),
     ('/home/songza/.../api.py', 'api')]

[it doesn't really have ...'s in the paths; I just elided some text to 
make it easier to read]

> > * It's homogeneous.  There's nothing particularly significant about each
> > entry other than it's the next one in the stack.
> 
> The complete stack trace is inhomogeneous and immutable. I've already 
> covered immutability above: removing, adding or moving lines will 
> invalidate the stack trace. Inhomogeneity comes from the structure of a 
> stack trace. The mere fact that each line is a string does not mean that 
> any two lines are equivalent. Different lines represent different things.

No.  Each entry in the list represents a source file and a function 
name.  They're all the same "shape".  You could remove one or add 
another one, or shuffle the order, and you would have something which 
was syntactically correct and semantically meaningful (even if it didn't 
represent an actual code path.

> - drop the Traceback line and the final exception line;
> - parse the File lines to extract the useful fields;
> - combine them with the source code.

You mean, kind of like the code I cited does? :-)

I think we're going to have to let this be.  You obviously have your 
concept of what a tuple is and what a list is.  I disagree.  I don't 
think either of us is right or wrong, we just have different ways of 
thinking about things.

You come at it from a theoretical point of view.  You think of each type 
as an embodiment of certain concepts ("it represents a fixed-length 
heterogenous sequence").  Your thinking is driven by what each type was 
intended to be used for.

I come at it from a practical point of view.  To me, each type is a 
collection of methods.  I have certain operations I need to perform.  I 
pick the type which offers those operations.  If the set of operations I 
need to perform (in this case, {append, hash}) don't exist in a single 
type, I'm forced to use both types and convert from one to the other as 
needed.

The theorist understands that a chisel and a screwdriver were intended 
for different purposes, but the pragmatist gets the paint can open.

[toc] | [prev] | [next] | [standalone]


#33524

FromChris Angelico <rosuav@gmail.com>
Date2012-11-19 13:18 +1100
Message-ID<mailman.3802.1353291501.27098.python-list@python.org>
In reply to#33522
On Mon, Nov 19, 2012 at 1:09 PM, Roy Smith <roy@panix.com> wrote:
> The theorist understands that a chisel and a screwdriver were intended
> for different purposes, but the pragmatist gets the paint can open.

A good tool can always be used in ways its inventor never intended -
and it will function as its user expects.

$ some_program | egrep --color=always '(ERROR|^)'

will highlight the word ERROR in red anywhere it appears in the
program's output, while maintaining all other lines without color. Not
normal use of grep, to be sure, but quite functional.

A tuple may have been intended to be a record, a struct, whatever, but
it is what it is, and I'll use one any time it's the best tool for the
job. Maybe its immutability is critical; or maybe it's just the most
convenient syntax and all I care about is that it be iterable.

But when I'm explaining grep to someone, I'll describe it as a filter
that keeps only some lines from the original, and when I describe a
tuple, I'll point out that it's immutable and (potentially) hashable.
The obvious first, the unobvious later.

ChrisA

[toc] | [prev] | [next] | [standalone]


#33530

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2012-11-19 02:42 +0000
Message-ID<mailman.3808.1353292877.27098.python-list@python.org>
In reply to#33522
On 19/11/2012 02:09, Roy Smith wrote:
>
> The theorist understands that a chisel and a screwdriver were intended
> for different purposes, but the pragmatist gets the paint can open.
>

To throw a chiseldriver into the works, IIRC a tuple is way faster to 
create but accessing a list is much faster.  The obvious snag is that 
may have been Python 2.7 whereas 3.3 is completely different.  Sorry but 
I'm currently wearing my XXXL size Lazy Bone Idle Hat so have no figures 
to back my probably incorrect memory up, anyone know anything about this?

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]


#33533

FromIan Kelly <ian.g.kelly@gmail.com>
Date2012-11-18 23:01 -0700
Message-ID<mailman.3814.1353304941.27098.python-list@python.org>
In reply to#33522
On Sun, Nov 18, 2012 at 7:42 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> To throw a chiseldriver into the works, IIRC a tuple is way faster to create
> but accessing a list is much faster.  The obvious snag is that may have been
> Python 2.7 whereas 3.3 is completely different.  Sorry but I'm currently
> wearing my XXXL size Lazy Bone Idle Hat so have no figures to back my
> probably incorrect memory up, anyone know anything about this?

It's not been my experience with Python 2.7 that list access is faster
than tuple access.  Tuples are as fast as or faster than lists, pretty
much universally.  They seem to have closed the gap a bit in
Python 3.3, though, as the following timings show.  For one-shot
construction, tuples seem to be more efficient for short sequences,
but then lists win for longer sequences, although not by much.  Of
course, lists are always going to be much slower if you build them up
with appends and extends.

C:\>python -m timeit -s "x = range(10)" "tuple(x)"
1000000 loops, best of 3: 0.773 usec per loop

C:\>python -m timeit -s "x = range(10)" "list(x)"
1000000 loops, best of 3: 0.879 usec per loop

C:\>python -m timeit -s "x = range(100)" "tuple(x)"
100000 loops, best of 3: 2.88 usec per loop

C:\>python -m timeit -s "x = range(100)" "list(x)"
100000 loops, best of 3: 2.63 usec per loop

C:\>python -m timeit -s "x = range(1000)" "tuple(x)"
10000 loops, best of 3: 37.4 usec per loop

C:\>python -m timeit -s "x = range(1000)" "list(x)"
10000 loops, best of 3: 36.2 usec per loop

C:\>python -m timeit -s "x = range(10000)" "tuple(x)"
1000 loops, best of 3: 418 usec per loop

C:\>python -m timeit -s "x = range(10000)" "list(x)"
1000 loops, best of 3: 410 usec per loop


For iteration, tuples are consistently 7-8% faster.


C:\>python -m timeit -s "x = tuple(range(10))" "for i in x: pass"
1000000 loops, best of 3: 0.467 usec per loop

C:\>python -m timeit -s "x = list(range(10))" "for i in x: pass"
1000000 loops, best of 3: 0.498 usec per loop

C:\>python -m timeit -s "x = tuple(range(100))" "for i in x: pass"
100000 loops, best of 3: 3.31 usec per loop

C:\>python -m timeit -s "x = list(range(100))" "for i in x: pass"
100000 loops, best of 3: 3.56 usec per loop

C:\>python -m timeit -s "x = tuple(range(1000))" "for i in x: pass"
10000 loops, best of 3: 31.6 usec per loop

C:\>python -m timeit -s "x = list(range(1000))" "for i in x: pass"
10000 loops, best of 3: 34.3 usec per loop

C:\>python -m timeit -s "x = tuple(range(10000))" "for i in x: pass"
1000 loops, best of 3: 318 usec per loop

C:\>python -m timeit -s "x = list(range(10000))" "for i in x: pass"
1000 loops, best of 3: 341 usec per loop


For direct item access, tuples seem to be about 2-3% faster.


C:\>python -m timeit -s "import operator as o; x = tuple(range(10)); g
= o.itemgetter(*range(len(x)))" "g(x)"
1000000 loops, best of 3: 0.67 usec per loop

C:\>python -m timeit -s "import operator as o; x = list(range(10)); g
= o.itemgetter(*range(len(x)))" "g(x)"
1000000 loops, best of 3: 0.674 usec per loop

C:\>python -m timeit -s "import operator as o; x = tuple(range(100));
g = o.itemgetter(*range(len(x)))" "g(x)"
100000 loops, best of 3: 4.52 usec per loop

C:\>python -m timeit -s "import operator as o; x = list(range(100)); g
= o.itemgetter(*range(len(x)))" "g(x)"
100000 loops, best of 3: 4.65 usec per loop

C:\>python -m timeit -s "import operator as o; x = tuple(range(1000));
g = o.itemgetter(*range(len(x)))" "g(x)"
10000 loops, best of 3: 43.2 usec per loop

C:\>python -m timeit -s "import operator as o; x = list(range(1000));
g = o.itemgetter(*range(len(x)))" "g(x)"
10000 loops, best of 3: 43.7 usec per loop

C:\>python -m timeit -s "import operator as o; x =
tuple(range(10000)); g = o.itemgetter(*range(len(x)))" "g(x)"
1000 loops, best of 3: 422 usec per loop

C:\>python -m timeit -s "import operator as o; x = list(range(10000));
g = o.itemgetter(*range(len(x)))" "g(x)"
1000 loops, best of 3: 447 usec per loop

[toc] | [prev] | [next] | [standalone]


#33534

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-19 07:54 +0000
Message-ID<50a9e5cf$0$21863$c3e8da3$76491128@news.astraweb.com>
In reply to#33522
On Sun, 18 Nov 2012 21:09:36 -0500, Roy Smith wrote:

> In article <50a97de0$0$29983$c3e8da3$5496439d@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> 
> 
>> > The stack that's returned is a list.  It's inherently a list, per the
>> > classic definition:
>> 
>> Er, no, it's inherently a blob of multiple text lines.
> 
> No, it's a list that looks like (taken from the doc string of the code I
> referenced):
> 
>     [('/usr/lib/.../base.py', 'get_response'),
>      ('/home/songza/.../views.py', 'song_info'),
>      ('/home/songza.../api.py', 'get_song'), 
>      ('/home/songza/.../api.py', 'api')]
> 
> [it doesn't really have ...'s in the paths; I just elided some text to
> make it easier to read]

I see. It wasn't clear from your earlier description that the items had 
been post-processed from collections of raw log lines to fixed records. 
But it doesn't actually change my analysis any. See below.

By the way, based on the sample data you show, your script is possibly 
broken. You don't record either the line number that raises, or the 
exception raised, so your script doesn't differentiate between different 
errors that happen to occur with similar stack traces. (I say "possibly" 
broken because I don't know what your requirements are. Maybe your 
requirements are sufficiently wide that you don't care that distinct 
failures are counted together.)

E.g. these three stack traces will probably generate the same fixed 
record, even though the errors are distinct:

#1
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 60, in func
    return 1/x
ZeroDivisionError: float division


#2
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 60, in func
    return 1/x
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'


#3
Traceback (most recent call last):
  File "./spam.py", line 20, in select
    selection = func(a, b)
  File "./spam.py", line 55, in func
    y = 1/(a + b)
ZeroDivisionError: float division


Maybe that's okay for your application, but it strikes me as odd that you 
do distinguish *some* distinct errors in the same function, but not 
others.



>> > * It's homogeneous.  There's nothing particularly significant about
>> > each entry other than it's the next one in the stack.
>> 
>> The complete stack trace is inhomogeneous and immutable. I've already
>> covered immutability above: removing, adding or moving lines will
>> invalidate the stack trace. Inhomogeneity comes from the structure of a
>> stack trace. The mere fact that each line is a string does not mean
>> that any two lines are equivalent. Different lines represent different
>> things.
> 
> No.  Each entry in the list represents a source file and a function
> name.  They're all the same "shape".  You could remove one or add
> another one, or shuffle the order, and you would have something which
> was syntactically correct and semantically meaningful (even if it didn't
> represent an actual code path.

If you remove/add/shuffle lines in the stack, you no longer have the same 
stack. Take the example you gave before:

stack1 = [('/usr/lib/.../base.py', 'get_response'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/home/songza.../api.py', 'get_song'), 
          ('/home/songza/.../api.py', 'api')
          ]

Here's a different stack trace, representing a different code path, which 
as you say is syntactically correct and semantically meaningful:

stack2 = [('/home/songza/.../api.py', 'api'),
          ('/home/songza.../api.py', 'get_song'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/usr/lib/.../base.py', 'get_response')
          ]

Since they are different stacks, they are treated as different keys:

data = {stack1: 11, stack2: 22}

Do you agree that this is what your application expects? Different stack 
traces are different keys, associated with different values.

I claim this only makes sense if you treat the stacks as inherently 
immutable. Never mind Python's limitation. Let's pretend we were running 
this code under some other language, NeoPython, which allowed mutable 
keys.

You claim that stacks are *inherently mutable*. So I should be able to do 
this:

stack1.sort()  # it's the *same stack*, all I've done is mutate it
print data[stack1]

and expect to see "11" printed. I am looking at the same key, right? So I 
certainly don't expect to see the value associated with a completely 
different key.

But wait a minute... after sorting, stack1 and stack2 now are equal. I 
could just as easily expect to see "22" printed.

I thought we had just agreed that stack1 and stack2 are *different* keys. 
Of course they are different. They represent different code paths. But 
after sorting stack1, it looks exactly like stack2. It looks like a 
different code path. It *lies* -- it no longer represents the code path 
that it actually represents, instead it looks like a *different* code 
path.

I then generate another stack:

stack3 = [('/home/songza/.../api.py', 'api'),
          ('/home/songza.../api.py', 'get_song'),
          ('/home/songza/.../views.py', 'song_info'),
          ('/usr/lib/.../base.py', 'get_response')
          ]

should data[stack3] return 11 (it has the same value as stack1) or 22 (it 
has the same value as stack2)? Or possibly 33? Or raise KeyError?

Treating stacks in this context as mutable is *incoherent*. It is nice 
and convenient to be able to build up a stack trace using a mutable list, 
you won't get an argument from me about that, but that can only be 
considered a temporary data structure used to build the data structure 
you actually care about, which is fixed.

That brings it back to my question: your application is not a counter-
example to my question about using lists as keys, because your data is 
not inherently list-like. It is inherently tuple-like, you just build it 
using a temporary list. That's perfectly fine, by the way, I do the same 
thing.

As you say, the order of the lines in the stack trace is significant. You 
cannot expect to mutate the stack and move lines around and treat it as 
the same stack. If you move the lines about, it represents a different 
stack. That is fundamentally different from the normal use of a list, 
where you do expect to be able to move lines about and still have it 
count as "the same list".


> I think we're going to have to let this be.  You obviously have your
> concept of what a tuple is and what a list is.  I disagree.  

I think a tuple is an immutable sequence of items, and a list is a 
mutable sequence of items.


> I don't
> think either of us is right or wrong, we just have different ways of
> thinking about things.
> 
> You come at it from a theoretical point of view.

I certainly do not. My position here is imminently practical. The 
alternative, the mutability of keys, is simply incoherent.


> You think of each type
> as an embodiment of certain concepts ("it represents a fixed-length
> heterogenous sequence").  Your thinking is driven by what each type was
> intended to be used for.

Not even close. My thinking is driven by the things each data structure 
needs to do. See below.


> I come at it from a practical point of view.  To me, each type is a
> collection of methods.  I have certain operations I need to perform.  I
> pick the type which offers those operations.  If the set of operations I
> need to perform (in this case, {append, hash}) don't exist in a single
> type, I'm forced to use both types and convert from one to the other as
> needed.

I don't see that as a problem. Converting from one type to another is 
exactly the sort of thing I described in my earlier question.

In your application, you build up a collection of code lines that 
represent a stack trace. Here's that example from your own documentation 
again:

[('/usr/lib/.../base.py', 'get_response'),
 ('/home/songza/.../views.py', 'song_info'),
 ('/home/songza.../api.py', 'get_song'), 
 ('/home/songza/.../api.py', 'api')]

What are the sorts of things I might meaningfully want to do with this 
*complete* stack trace?

Add extra lines to it? No. If I needed to add extra lines, it wouldn't be 
complete.

Delete lines? Certainly not, that would change the code path it claims to 
represent to a code path it doesn't represent.

Sort the list? Reverse it? Heavens no.

If you look at the available list methods, *not one* of the mutating 
methods is appropriate to a completed stack trace object. *None* of the 
mutator list methods are appropriate once the stack trace object is 
complete, and using them would be counter-productive.

If you believe different, then please tell me what mutations your code 
actually performs after the stack trace object is completed. In the code 
you showed, you throw the list away after turning it into a tuple.

If the object represents a "list of code lines", in the sense of a 
mutable Python list rather than a mere sequence, then why don't you use 
any list methods on it?

The append method is useful during construction, but that is all. After 
the stack is complete, use of any mutator method would be a bug. In other 
words, it ought to be immutable, and the use of a list ought to be buried 
in the appropriate function as an internal implementation detail. The 
public interface ought to be that of an immutable tuple of immutable 
strings, because once you have finished building the object, it should 
not be possible to mutate it.

This is hardly a theoretical viewpoint. The idea of treating data that 
ought not be changed as immutable is borne out of bitter experience of 
millions of man-hours tracking down hundreds of thousands of bugs.

(Admittedly not all of those bugs were *my* bugs. I'm talking the 
collective experience of programmers over fifty years of coding.)



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#33539

FromRoy Smith <roy@panix.com>
Date2012-11-19 09:30 -0500
Message-ID<roy-670C61.09305419112012@news.panix.com>
In reply to#33534
In article <50a9e5cf$0$21863$c3e8da3$76491128@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> I see. It wasn't clear from your earlier description that the items had 
> been post-processed from collections of raw log lines to fixed records.

Well, I did provide the code that does this.
 
> But it doesn't actually change my analysis any. See below.
> 
> By the way, based on the sample data you show, your script is possibly 
> broken. You don't record either the line number that raises, or the 
> exception raised, so your script doesn't differentiate between different 
> errors that happen to occur with similar stack traces. 

You really might want to read the code I provided.  Here's the reference 
again:

https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/traceba
ck_helper.py

The "header" referred to does indeed contain the exception raised.  And 
the line numbers are included.  Here's a typical output stanza:

2012-11-19T00:00:15+00:00 web5 ˇ˛2012-11-19 00:00:15,831 [2712]: 
songza-api IGPhwNU2SJ691cx8 4C0ABFA9-50A974E7-384995 W6D-HSO 
173.145.137.54 songza.django.middleware ERROR process_exception() Path = 
u'/api/1/station/1459775/next', Exception = 
ValueError(u"<SequentialSongPicker: <Station 1459775: u'Old School 
105.3'>>: no song ids for mp3",)
/home/songza/env/python/local/lib/python2.7/site-packages/django/core/han
dlers/base.py:111:get_response()
/home/songza/deploy/current/pyza/djapi/decorators.py:11:_wrapped_view_fun
c()
/home/songza/env/python/local/lib/python2.7/site-packages/django/views/de
corators/http.py:45:inner()
/home/songza/deploy/current/pyza/djapi/views.py:1659:station_next()
/home/songza/deploy/current/pyza/models/station.py:660:next_song()
/home/songza/deploy/current/pyza/lib/song_picker.py:327:pick()

> I say "possibly" broken because I don't know what your requirements are.

Our requirements are to scan the logs of a production site and filter 
down the gobs and gobs of output (we produced 70 GB of log files 
yesterday) into something small enough that a human can see what the 
most common failures were.  The tool I wrote does that.

The rest of this conversation is just silly.  It's turning into getting 
hit on the head lessons.

[toc] | [prev] | [next] | [standalone]


#33546

FromIan Kelly <ian.g.kelly@gmail.com>
Date2012-11-19 09:44 -0700
Message-ID<mailman.4.1353343501.29569.python-list@python.org>
In reply to#33539
On Mon, Nov 19, 2012 at 7:30 AM, Roy Smith <roy@panix.com> wrote:
> In article <50a9e5cf$0$21863$c3e8da3$76491128@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>>
>> By the way, based on the sample data you show, your script is possibly
>> broken. You don't record either the line number that raises, or the
>> exception raised, so your script doesn't differentiate between different
>> errors that happen to occur with similar stack traces.
>
> You really might want to read the code I provided.  Here's the reference
> again:
>
> https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/traceba
> ck_helper.py
>
> The "header" referred to does indeed contain the exception raised.  And
> the line numbers are included.  Here's a typical output stanza:

Yes, but the dict is still keyed on the traceback alone, and only the
first header for a particular traceback is stored.  If two different
exceptions occur at the same line of code and sharing the same
traceback, the second exception would be counted as a second
occurrence of the first, effectively squashing any reporting of the
second exception.

[toc] | [prev] | [next] | [standalone]


#33552

FromTerry Reedy <tjreedy@udel.edu>
Date2012-11-19 15:41 -0500
Message-ID<mailman.8.1353357720.29569.python-list@python.org>
In reply to#33539
On 11/19/2012 9:30 AM, Roy Smith wrote:

> Our requirements are to scan the logs of a production site and filter
> down the gobs and gobs of output (we produced 70 GB of log files
> yesterday) into something small enough that a human can see what the
> most common failures were.  The tool I wrote does that.
>
> The rest of this conversation is just silly.  It's turning into getting
> hit on the head lessons.

I agree. In early Python, tuples were more different from lists than 
they are today. They did not have any (public) methods. Today, they have 
.index and .count methods, which make little sense from the 'tuple is a 
record' viewpoint. The addition of those methods redefined tuples as 
read-only (and therefore hashable) sequences.

 From the collections.abc doc
'''
Sequence | Sized, Iterable, Container |
  __getitem__ __contains__, __iter__, __reversed__, index, and count
...
class collections.abc.Sequence
class collections.abc.MutableSequence
ABCs for read-only and mutable sequences.
'''
 >>> from collections.abc import Sequence
 >>> issubclass(tuple, Sequence)
True

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#33565

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-19 23:42 +0000
Message-ID<50aac3d8$0$29983$c3e8da3$5496439d@news.astraweb.com>
In reply to#33539
On Mon, 19 Nov 2012 09:30:54 -0500, Roy Smith wrote:

> In article <50a9e5cf$0$21863$c3e8da3$76491128@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> 
>> I see. It wasn't clear from your earlier description that the items had
>> been post-processed from collections of raw log lines to fixed records.
> 
> Well, I did provide the code that does this.

You did? When? [goes back and looks]

Oh, so you did. Oops.

By the way, your news client seems to be mangling long URLs, by splitting 
them when they exceed the maximum line length. I didn't follow the link 
you gave because it was mangled, and then forgot it even existed. Sorry 
about that.


[...]
> You really might want to read the code I provided.  Here's the reference
> again:
> 
> https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/
traceba
> ck_helper.py

And mangled again :)


> The "header" referred to does indeed contain the exception raised.  And
> the line numbers are included.  Here's a typical output stanza:
[snip]

Ian Kelly has picked up on what I'm trying to say. You might collect the 
traceback in the "header", but it doesn't get used in the key, and each 
time you find a repeated stack trace, you toss away whatever header you 
just saw and keep the header you saw the first time.

[quote]
        header, stack = traceback_helper.extract_stack(lines)
        signature = tuple(stack)
        if signature in crashes:
            count, header = crashes[signature]
            crashes[signature] = (count + 1, header)
        else:
            crashes[signature] = (1, header)
[end quote]


In general, it is an unsafe assumption that the actual exception raised 
will be the same just because the stack trace is the same. So as I said, 
if you have two *distinct* failures occurring in the same function (not 
even necessarily on the same line), your code appears to treat them as 
the same error. That seems odd to me, but if you have a good reason for 
doing it that way, so be it.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#33581

FromRoy Smith <roy@panix.com>
Date2012-11-19 21:33 -0500
Message-ID<roy-F2CF4C.21334219112012@news.panix.com>
In reply to#33565
In article <50aac3d8$0$29983$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> By the way, your news client seems to be mangling long URLs, by splitting 
> them when they exceed the maximum line length.

Hmmm.  So it did.  My bad.

[toc] | [prev] | [next] | [standalone]


#33541

FromRoy Smith <roy@panix.com>
Date2012-11-19 09:59 -0500
Message-ID<roy-03667B.09591919112012@news.panix.com>
In reply to#33534
OK, I've just read back over the whole thread.  I'm really struggling to 
understand what point you're trying to make.  I started out by saying:

> Use a list when you need an ordered collection which is mutable (i.e. 
> can be altered after being created).  Use a tuple when you need an 
> immutable list (such as for a dictionary key).

To which you obviously objected.  So now you write:

> I think a tuple is an immutable sequence of items, and a list is a 
> mutable sequence of items.

So how is that different from what I said?  Is this whole argument 
boiling down to your use of "immutable sequence" vs. my use of 
"immutable list"?

[toc] | [prev] | [next] | [standalone]


#33567

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-19 23:53 +0000
Message-ID<50aac66c$0$29983$c3e8da3$5496439d@news.astraweb.com>
In reply to#33541
On Mon, 19 Nov 2012 09:59:19 -0500, Roy Smith wrote:

> OK, I've just read back over the whole thread.  I'm really struggling to
> understand what point you're trying to make.  I started out by saying:
> 
>> Use a list when you need an ordered collection which is mutable (i.e.
>> can be altered after being created).  Use a tuple when you need an
>> immutable list (such as for a dictionary key).
> 
> To which you obviously objected.  So now you write:
> 
>> I think a tuple is an immutable sequence of items, and a list is a
>> mutable sequence of items.
> 
> So how is that different from what I said?  Is this whole argument
> boiling down to your use of "immutable sequence" vs. my use of
> "immutable list"?

Sheesh, of course not. Give me some credit.

I gave some examples of when somebody might use lists, tuples, sets and 
dicts. Apparently I forgot a couple, and you responded with a sarcastic 
comment about the "One True Church Of Pythonic Orthodoxy And Theoretical 
Correctness" and gave a couple of additional examples.

Although I didn't come out and *explicitly* say "I agree" to your 
examples, I actually did, with one proviso: your example of using an 
"immutable list" as dict key. So I asked a question about that *specific* 
use-case:

[quote]
Under what sort of circumstances would somebody want to take a mutable
list of data, say a list of email addresses, freeze it into a known state,
and use that frozen state as a key in a dict? What would be the point?
Even if there was some meaningful reason to look up "this list of 12000
email addresses" as a single key, it is going to get out of sync with the
actual mutable list.
[end quote]

Your reply was to give your stack trace script as an example. That's a 
fine example as a use-case for a temporary list, and I've done similar 
things dozens, hundreds of times myself. As I said:

[quote]
Sure, I have built a collection of items as a list, because lists are
mutable, then frozen it into a tuple, and *thrown the list away*, then
used the tuple as a key. But that's not the same thing, the intent is
different. In my case, the data was never intended to be a list, it was
always intended to be a fixed record-like collection, the use of list was
as a temporary data structure used for construction. A bit like the idiom
of ''.join(some_list).
[end quote]

To me, this sounds *exactly* like your use-case: your data, stack traces, 
represent a little chunk of immutable data that you build up a line at a 
time using a temporary list first, just like I wrote. And I said so. 
There's no sign in either your code or your description that the stack 
traces get treated as mutable objects in any way once you have finished 
building them a line at a time. So your real world, practical, "in the 
trenches" example matches my experience: you build a *fixed data record* 
using a *temporary list*, throw the list away, and then never mutate that 
data record again.

So why are we disagreeing? Like many such discussions on the Internet, 
this one has rambled a bit, and I've misunderstood some of your code 
(sorry), and you seem to have misunderstood the question I am asking. 
Maybe my explanation was not clear enough, in which case, sorry again.

I'm asking about the case where one might want the key to remain mutable 
even after it is used as a key, but can't because Python won't let you. 
There's no sign that your stack trace example is such an example.

As I earlier said:

[quote]
But I can't think of any meaningful, non-contrived example where I might
want an actual mutable list of values as a dict key.
[end quote]


and I still can't. 



-- 
Steven

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web