Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #84454 > unrolled thread
| Started by | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| First post | 2015-01-24 10:16 +0000 |
| Last post | 2015-01-24 22:51 +0100 |
| Articles | 20 on this page of 28 — 7 participants |
Back to article view | Back to comp.lang.python
__bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 10:16 +0000
Re: __bases__ misleading error message Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-01-24 23:43 +1100
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 22:14 +0100
Re: __bases__ misleading error message Ian Kelly <ian.g.kelly@gmail.com> - 2015-01-24 14:45 -0700
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 23:09 +0100
Re: __bases__ misleading error message Chris Angelico <rosuav@gmail.com> - 2015-01-25 09:25 +1100
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 23:33 +0100
Re: __bases__ misleading error message Chris Angelico <rosuav@gmail.com> - 2015-01-25 09:37 +1100
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 23:59 +0100
Re: __bases__ misleading error message Terry Reedy <tjreedy@udel.edu> - 2015-01-24 16:58 -0500
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 23:02 +0100
Re: __bases__ misleading error message Ian Kelly <ian.g.kelly@gmail.com> - 2015-01-24 15:16 -0700
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 23:36 +0100
Re: __bases__ misleading error message Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-01-25 14:18 +1100
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-25 12:07 +0100
Re: __bases__ misleading error message Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-01-25 23:00 +1100
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-25 13:49 +0100
Re: __bases__ misleading error message Marko Rauhamaa <marko@pacujo.net> - 2015-01-25 14:53 +0200
Re: __bases__ misleading error message Terry Reedy <tjreedy@udel.edu> - 2015-01-25 16:35 -0500
Re: __bases__ misleading error message Ian Kelly <ian.g.kelly@gmail.com> - 2015-01-25 19:21 -0700
Re: __bases__ misleading error message Marco Buttu <marco.buttu@gmail.com> - 2015-01-24 23:09 +0100
Re: __bases__ misleading error message Marco Buttu <marco.buttu@gmail.com> - 2015-01-24 15:12 +0100
Re: __bases__ misleading error message Terry Reedy <tjreedy@udel.edu> - 2015-01-24 14:24 -0500
Re: __bases__ misleading error message Mario Figueiredo <marfig@gmail.com> - 2015-01-24 22:03 +0100
Re: __bases__ misleading error message Marco Buttu <marco.buttu@gmail.com> - 2015-01-24 22:51 +0100
Re: __bases__ misleading error message Terry Reedy <tjreedy@udel.edu> - 2015-01-24 19:55 -0500
Re: __bases__ misleading error message Marco Buttu <marco.buttu@gmail.com> - 2015-01-25 11:30 +0100
Re: __bases__ misleading error message Marco Buttu <marco.buttu@gmail.com> - 2015-01-24 22:51 +0100
Page 1 of 2 [1] 2 Next page →
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 10:16 +0000 |
| Subject | __bases__ misleading error message |
| Message-ID | <1a194e0a0b738d205de54180fa7@nntp.aioe.org> |
Consider the following code at your REPL of choice
class Super:
pass
class Sub:
pass
foo = Sub()
Sub.__bases__
foo.__bases__
The last statement originates the following error:
AttributeError: 'Sub' object has no attribute '__bases__'
Naturally the 'Sub' object has an attribute __bases__. It's the instance
that has not. So shouldn't the error read like:
AttributeError: 'Sub' instance has no attribute '__bases__', or
AttributeError: 'foo' object has no attribute '__bases__'
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-01-24 23:43 +1100 |
| Message-ID | <54c39366$0$13006$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #84454 |
Mario Figueiredo wrote: > > Consider the following code at your REPL of choice > > class Super: > pass Super is irrelevant here, since it isn't used. > class Sub: > pass > > foo = Sub() > > Sub.__bases__ > foo.__bases__ > > The last statement originates the following error: > > AttributeError: 'Sub' object has no attribute '__bases__' It's a bit ambiguous, but the way to read it is to think of object as a synonym for instance. This is, in my opinion, a Java-ism which is inappropriate for Python where classes are objects too, but we seem to be stuck with it. So we have a Sub instance (object) which has no attribute '__bases__'. This is no different from: py> (23).spam Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute 'spam' > Naturally the 'Sub' object has an attribute __bases__. Correct, in the sense that classes are objects too. But in the sense of object=instance, no. Isn't ambiguous terminology wonderful? > It's the instance > that has not. So shouldn't the error read like: > > AttributeError: 'Sub' instance has no attribute '__bases__', or > AttributeError: 'foo' object has no attribute '__bases__' The first would be nice. The second is impossible: objects may have no name, one name, or many names, and they do not know what names they are bound to. So the Sub instance bound to the name 'foo' doesn't know that its name is 'foo', so it cannot display it in the error message. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 22:14 +0100 |
| Message-ID | <MPG.2f2e298c14dbed69989692@nntp.aioe.org> |
| In reply to | #84461 |
In article <54c39366$0$13006$c3e8da3$5496439d@news.astraweb.com>,
steve+comp.lang.python@pearwood.info says...
> > AttributeError: 'Sub' instance has no attribute '__bases__',
> > AttributeError: 'foo' object has no attribute '__bases__'
>
> The first would be nice. The second is impossible: objects may have no name,
> one name, or many names, and they do not know what names they are bound to.
> So the Sub instance bound to the name 'foo' doesn't know that its name
> is 'foo', so it cannot display it in the error message.
Thanks for the information! :)
But that begs the OT question: How does Python maps names to memory
addresses in the interpreter?
"__main__"
from module import a_name
y = a_name + 1
How does python interpreter know how to map 'name' to the correct memory
location, if this __main__ code is only ran after 'module' code?
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-01-24 14:45 -0700 |
| Message-ID | <mailman.18099.1422135976.18130.python-list@python.org> |
| In reply to | #84493 |
On Sat, Jan 24, 2015 at 2:14 PM, Mario Figueiredo <marfig@gmail.com> wrote: > In article <54c39366$0$13006$c3e8da3$5496439d@news.astraweb.com>, > steve+comp.lang.python@pearwood.info says... >> > AttributeError: 'Sub' instance has no attribute '__bases__', >> > AttributeError: 'foo' object has no attribute '__bases__' >> >> The first would be nice. The second is impossible: objects may have no name, >> one name, or many names, and they do not know what names they are bound to. >> So the Sub instance bound to the name 'foo' doesn't know that its name >> is 'foo', so it cannot display it in the error message. > > Thanks for the information! :) > > But that begs the OT question: No, it doesnt. http://en.wikipedia.org/wiki/Begging_the_question > How does Python maps names to memory > addresses in the interpreter? Global variables are looked up in the current stack frame's globals dict. >>> a = 1 >>> b = 2 >>> globals()['a'] 1 >>> globals()['b'] 2 Local variables of functions could be handled the same way, but for efficiency the compiler instead maps the names to indices of a local variable array associated with the stack frame. Either way, at the C level the value stored in the dict or array is a pointer to the memory location of the object. > "__main__" > from module import a_name > y = a_name + 1 > > How does python interpreter know how to map 'name' to the correct memory > location, if this __main__ code is only ran after 'module' code? I'm not sure I'm understanding what you're asking, but the import statement imports the module, looks up "a_name" in that module's globals dict, and binds the same object to a_name in the current module's globals dict.
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 23:09 +0100 |
| Message-ID | <MPG.2f2e3651befa4c9989694@nntp.aioe.org> |
| In reply to | #84496 |
In article <mailman.18099.1422135976.18130.python-list@python.org>,
ian.g.kelly@gmail.com says...
>
> On Sat, Jan 24, 2015 at 2:14 PM, Mario Figueiredo <marfig@gmail.com> wrote:
> > But that begs the OT question:
>
> No, it doesnt. http://en.wikipedia.org/wiki/Begging_the_question
Cute.
> I'm not sure I'm understanding what you're asking, but the import
> statement imports the module, looks up "a_name" in that module's
> globals dict, and binds the same object to a_name in the current
> module's globals dict.
Meaning the interpreter knows a variable's name. Which would allow it to
produce an error message such as:
AttributeError: 'foo' object has no attribute '__bases__'
For the following code:
class Sub:
pass
foo = Sub()
foo.__bases__
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-01-25 09:25 +1100 |
| Message-ID | <mailman.18104.1422138354.18130.python-list@python.org> |
| In reply to | #84502 |
On Sun, Jan 25, 2015 at 9:09 AM, Mario Figueiredo <marfig@gmail.com> wrote:
> Meaning the interpreter knows a variable's name. Which would allow it to
> produce an error message such as:
>
> AttributeError: 'foo' object has no attribute '__bases__'
>
> For the following code:
>
> class Sub:
> pass
>
> foo = Sub()
> foo.__bases__
Let me explain by way of analogy. You have ten shoeboxes to store your
stuff in. I hand you a thing and say "Here, put this into shoebox #4".
Then someone else comes along and says, "I need the thing from shoebox
#4", so you give him that thing. Now, he hands that thing to someone
else and asks him which shoebox it came out of, just by looking at the
thing itself. How can he say? The thing doesn't have any way of
knowing what shoebox it came out of.
Python names reference objects. But once you get an object, there's no
way of knowing which name was used to get to it. There might be one
such name; there might be more than one; and there might not be any.
You can't identify an object by the name it's bound to, but you can
identify it by something that's always true of the object itself, like
its type.
There are a few cases where names are so useful that they get attached
to the objects themselves. The 'def' and 'class' statements create
objects and also record the names used. But you still can't identify
what name was used to reference something:
>>> def func(x): print("x = %s"%x)
...
>>> func(123)
x = 123
>>> show = func
>>> show(234)
x = 234
>>> func = "not a function"
>>> func(123)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable
>>> show(123)
x = 123
>>> show
<function func at 0x7f36744a30d0>
No matter what name I use to reference the function, it's still called
"func". Nobody can ever know that I'm identifying it by the name
'show' now.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 23:33 +0100 |
| Message-ID | <MPG.2f2e3c017bcaae94989696@nntp.aioe.org> |
| In reply to | #84505 |
In article <mailman.18104.1422138354.18130.python-list@python.org>, rosuav@gmail.com says... > > Let me explain by way of analogy. [snipped] Gotcha! Thanks for the explanation :)
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-01-25 09:37 +1100 |
| Message-ID | <mailman.18105.1422139063.18130.python-list@python.org> |
| In reply to | #84507 |
On Sun, Jan 25, 2015 at 9:33 AM, Mario Figueiredo <marfig@gmail.com> wrote: > In article <mailman.18104.1422138354.18130.python-list@python.org>, > rosuav@gmail.com says... >> >> Let me explain by way of analogy. > [snipped] > > Gotcha! Thanks for the explanation :) Awesome! I'm always a bit wary of analogies... sometimes they're really helpful, other times they're unhelpful and confusing. Glad this was one of the better cases. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 23:59 +0100 |
| Message-ID | <MPG.2f2e42298df5a9c2989698@nntp.aioe.org> |
| In reply to | #84509 |
In article <mailman.18105.1422139063.18130.python-list@python.org>, rosuav@gmail.com says... > Awesome! I'm always a bit wary of analogies... sometimes they're > really helpful, other times they're unhelpful and confusing. Yeah. Your's was all it took :) The thing with analogies is to never take them literally. They are analogies, after all. But there is this old funny thing we humans seem to share that an analogy should be dissected like it was a scientific paper. - You say shoes in a box? Why, but memory addresses aren't boxes. Besides a box can only take shoes this big. Memory addresses can take any size object. - No I meant.. Look, just imagine shoes in a box. - Alright... - Now the other person will be handed the shoe you asked. They don't know what box it came from. What this mea... - How come? - How come what? - Why don't they know? They could just agree to know what box the shoe came from. Problem solved. - No, but I am trying to illustrate how it works. Not how it could work. - I still don't get it. Why does it work like that. Seems stupid... - It's not. There are specific reasons to not know. It's got to do with the process stack and efficiency and... - Right. And there's also the most annoying of all, the smartasses that like to stay hidden in the shadows and as soon as they see an analogy they jump in and tada! "It's not true that memory spaces can hold any object size. It is limited by the computer available memory" -- well, duh! "Is that a float you are using to compute a salary raise in your code snippet meant as an example to illustrate code syntax? Hahaha" -- Sigh!
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-01-24 16:58 -0500 |
| Message-ID | <mailman.18102.1422136703.18130.python-list@python.org> |
| In reply to | #84493 |
On 1/24/2015 4:14 PM, Mario Figueiredo wrote: > In article <54c39366$0$13006$c3e8da3$5496439d@news.astraweb.com>, > steve+comp.lang.python@pearwood.info says... >>> AttributeError: 'Sub' instance has no attribute '__bases__', >>> AttributeError: 'foo' object has no attribute '__bases__' >> >> The first would be nice. The second is impossible: objects may have no name, >> one name, or many names, and they do not know what names they are bound to. >> So the Sub instance bound to the name 'foo' doesn't know that its name >> is 'foo', so it cannot display it in the error message. > > Thanks for the information! :) > > But that begs the OT question: How does Python maps names to memory > addresses in the interpreter? Python the language maps names to objects that have identity, type, and value. The CPython implementation does the mapping with a hash table and C pointers (to computer memory addresses), but addresses are not part of the language definition. Neuroscientists still puzzle over how we do such mapping. > "__main__" > from module import a_name A module is a namespace associating names with objects. This statememt says to import the a_name to object association from module and add it to __main__ > y = a_name + 1 This statement uses the imported association in __main__ to access the object and add 1, and bind 'y' to the resulting object. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 23:02 +0100 |
| Message-ID | <MPG.2f2e34c1b650760e989693@nntp.aioe.org> |
| In reply to | #84500 |
In article <mailman.18102.1422136703.18130.python-list@python.org>, tjreedy@udel.edu says... > > > "__main__" > > from module import a_name > > A module is a namespace associating names with objects. This statememt > says to import the a_name to object association from module and add it > to __main__ > > > y = a_name + 1 > > This statement uses the imported association in __main__ to access the > object and add 1, and bind 'y' to the resulting object. But I'm being told the interpreter has no knowledge of a variable name. So, how does the interpreter know, once it reaches the assigment line above, how to map a_name to the correct object in memory?
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-01-24 15:16 -0700 |
| Message-ID | <mailman.18103.1422137849.18130.python-list@python.org> |
| In reply to | #84501 |
On Sat, Jan 24, 2015 at 3:02 PM, Mario Figueiredo <marfig@gmail.com> wrote:
> In article <mailman.18102.1422136703.18130.python-list@python.org>,
> tjreedy@udel.edu says...
>>
>> > "__main__"
>> > from module import a_name
>>
>> A module is a namespace associating names with objects. This statememt
>> says to import the a_name to object association from module and add it
>> to __main__
>>
>> > y = a_name + 1
>>
>> This statement uses the imported association in __main__ to access the
>> object and add 1, and bind 'y' to the resulting object.
>
>
> But I'm being told the interpreter has no knowledge of a variable name.
> So, how does the interpreter know, once it reaches the assigment line
> above, how to map a_name to the correct object in memory?
No, you're being told that the *object* doesn't know the names of the
variables that it's bound to. In the context above, the variable is
right there under that name in the globals dict, as can be seen in the
disassembly:
>>> import dis
>>> dis.dis("y = a_name + 1")
1 0 LOAD_NAME 0 (a_name)
3 LOAD_CONST 0 (1)
6 BINARY_ADD
7 STORE_NAME 1 (y)
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
Now what happens in the byte code if we try to access an attribute on
that object?
>>> dis.dis("a_name.__bases__")
1 0 LOAD_NAME 0 (a_name)
3 LOAD_ATTR 1 (__bases__)
6 RETURN_VALUE
1) The value of a_name is looked up and pushed onto the stack.
2) The interpreter attempts to load the attribute __bases__ of
whatever object is on the top of the stack. There is no name
associated with that object at this point; it's just an object.
Now imagine if the Python code in question were instead this:
def get_an_object(): return "foo"
get_an_object().__bases__
Would you really expect the interpreter to come up with a message like
"Return value of get_an_object() has no attribute '__bases__'"?
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-24 23:36 +0100 |
| Message-ID | <MPG.2f2e3cae35dc9068989697@nntp.aioe.org> |
| In reply to | #84504 |
In article <mailman.18103.1422137849.18130.python-list@python.org>, ian.g.kelly@gmail.com says... > > No, you're being told that the *object* doesn't know the names of the > variables that it's bound to. In the context above, the variable is > right there under that name in the globals dict, as can be seen in the > disassembly: [snipped] Yes. I got it now. I misinterpreted Steven words.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-01-25 14:18 +1100 |
| Message-ID | <54c4606a$0$13002$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #84493 |
Mario Figueiredo wrote:
> But that begs the OT question: How does Python maps names to memory
> addresses in the interpreter?
It doesn't.
You are thinking of an execution model like C or Pascal, where variables are
best thought of as fixed memory addresses. But Python, like many modern
languages (Java, Ruby, Lua, ...) uses a name-binding model.
Semantically, the fixed memory address model means that each variable is
like a fixed-width bucket, where the size depends on the type. That's why
the compiler needs to associate a fixed type with each variable, so it
knows how much space to allocate and how to initialise the bytes:
m = [0000]
n = [0000]
x = [8FFFFFFF]
Assigning a value to a variable ("m = 42", hex 2A) results in the compiler
storing that value in the bucket; assignment makes a copy: "n = m" leads to
two independent buckets with the same value:
m = [002A]
n = [002A]
Values and variables are dependent on each other. You can't have a variable
with no value, and you can't have a value with no variable. (This is an
over-simplification, but mostly true.)
The name-binding model is different. Values (objects) and names are
independent. Values can exist even if they have no name (although the
garbage collector will delete them as soon as they are unused). The
compiler associates a name to a value. One good mental model is to think of
the compiler attaching a piece of string from the name to its associated
object. Assigning n = m means that both names end up tied to the same
object, and there can be objects with no associated name. (So long as
*something* refers to them, the garbage collector will leave them be.)
m -----------------+----------- 0x2A
n ----------------/
x ----------------------------- 1.2345
s -----\ "Hello world"
+---------------------- "Goodbye now"
Under the hood, this is usually implemented using pointers. If you are
familiar with pointer semantics, you might think of these pieces of string
as pointers, except that you cannot do pointer arithmetic on them. But that
is merely the *implementation* of the language's variable model.
In Python, global variables use a dict, and there is a function to retrieve
that dict:
py> d = globals()
py> d['x'] = 23 # don't do this!
py> x
23
It's not *wrong* or forbidden to write to globals this way. It's just
unnecessary.
> "__main__"
> from module import a_name
> y = a_name + 1
>
> How does python interpreter know how to map 'name' to the correct memory
> location, if this __main__ code is only ran after 'module' code?
When the statement `from module import a_name` executes, Python:
(1) imports module;
(2) looks up "a_name" in module's namespace (a dict);
(3) creates an entry "a_name" in the current namespace (assuming
one doesn't already exist);
(4) and binds it to the object found in Step 2.
When it executes `y = a_name + 1`, Python:
(1) looks up the name "a_name" in the current namespace;
(2) creates the anonymous object 1, unbound to any name;
(3) calls + with those two objects;
(4) which (assuming it succeeds) creates a new object;
(5) creates an entry "y" in the current namespace;
(6) and binds it to the object returned in Step 4.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-25 12:07 +0100 |
| Message-ID | <MPG.2f2eecca37bc450998969a@nntp.aioe.org> |
| In reply to | #84548 |
In article <54c4606a$0$13002$c3e8da3$5496439d@news.astraweb.com>,
steve+comp.lang.python@pearwood.info says...
>
> It doesn't.
Your explanation was great Steven. Thank you. But raises one question...
>
> Assigning a value to a variable ("m = 42", hex 2A) results in the compiler
> storing that value in the bucket; assignment makes a copy: "n = m" leads to
> two independent buckets with the same value:
>
> m = [002A]
> n = [002A]
I'm still in the process of learning Python. So, that's probably why
this is unexpected to me.
I was under the impression that what python did was keep a lookup table
pointing to memory. Every variable gets an entry as type descriptor and
a pointer to a memory address, where the variable data resides.
(UDTs may be special in that they would have more than one entry, one
for each enclosing def and declared attribute)
In the example above, the n and m buckets would hold pointers, not
binary values. And because they are immutable objects, n and m pointers
would be different. Not equal. But in the case of mutable objects, n = m
would result in m having the same pointer address as n.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-01-25 23:00 +1100 |
| Message-ID | <54c4dae1$0$13005$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #84563 |
Mario Figueiredo wrote:
> In article <54c4606a$0$13002$c3e8da3$5496439d@news.astraweb.com>,
> steve+comp.lang.python@pearwood.info says...
>>
>> It doesn't.
>
> Your explanation was great Steven. Thank you. But raises one question...
>
>>
>> Assigning a value to a variable ("m = 42", hex 2A) results in the
>> compiler storing that value in the bucket; assignment makes a copy: "n =
>> m" leads to two independent buckets with the same value:
>>
>> m = [002A]
>> n = [002A]
Maybe I wasn't clear enough. The above is used by languages like C or
Pascal, which use fixed memory locations for variables. If I gave the
impression this was Python, I am sorry.
> I'm still in the process of learning Python. So, that's probably why
> this is unexpected to me.
>
> I was under the impression that what python did was keep a lookup table
> pointing to memory. Every variable gets an entry as type descriptor and
> a pointer to a memory address, where the variable data resides.
This sounds more or less correct, at least for CPython. CPython is
the "reference implementation", and probably the version you use when you
run Python. But it is not the only one, and they can be different.
(E.g. in Jython, the Python interpreter is built using Java, not C. You
can't work with pointers to memory addresses in Java, and the Java garbage
collector is free to move objects around when needed.)
In CPython, objects live in the heap, and Python tracks them using a
pointer. So when you bind a name to a value:
x = 23 # what you type
what happens is that Python sets a key + value in the global namespace (a
dictionary):
globals()['x'] = 23 # what Python runs
and the globals() dict will then look something like this:
{'x': 23, 'colour': 'red', 'y': 42}
(Note: *local* variables are similar but not quite the same. They're also
more complicated, so let's skip them for now.)
What happens inside the dictionary? Dictionaries are "hash tables", so they
are basically a big array of cells, and each cell is a pair of pointers,
one for the key and one for the value:
[dictionary header]
[blank]
[blank]
[ptr to the string 'y', ptr to the int 42]
[blank]
[ptr to 'x', ptr to 23]
[blank]
[blank]
[blank]
[ptr to 'colour', ptr to 'red']
[blank]
...
Notice that the order is unpredictable. Also, don't take this picture too
literally. Dicts are highly optimized, highly tuned and in active
development, the *actual* design of Python dicts may vary. But this is a
reasonable simplified view of how they could be designed.
The important thing to remember is that while CPython uses pointers under
the hood to make the interpreter work, pointers are not part of the Python
language. There is no way in Python to get a pointer to an object, or
increment a pointer, or dereference a pointer. You just use objects, and
the interpreter handles all the pointer stuff behind the scenes.
> (UDTs may be special in that they would have more than one entry, one
> for each enclosing def and declared attribute)
>
> In the example above, the n and m buckets would hold pointers, not
> binary values. And because they are immutable objects, n and m pointers
> would be different. Not equal. But in the case of mutable objects, n = m
> would result in m having the same pointer address as n.
No, this is certainly not the case! Python uses *exactly* the same rules for
mutable and immutable objects. In fact, Python can't tell what values are
mutable or immutable until it tries to modify it.
Remember I said that name-binding languages operate using a model of pieces
of string between the name and the object? Here are two names bound to the
same object:
m -----------+--------------- 0x2a
n ----------/
Obviously Python doesn't *literally* use a piece of string :-) so what
happens under the hood? Pointers again, at least in CPython.
In this case, if we look deep inside our globals dictionary again, we will
see two cells:
[ptr to the string "m", ptr to the int 0x2a]
[ptr to the string "n", ptr to the int 0x2a]
The two int pointers point to the same object. This is guaranteed by the
language:
m = 42
n = m
assert id(m) == id(n)
Both objects have the same ID and are the same object at the same memory
location. Assignment in Python NEVER makes a copy of the value being
assigned.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Mario Figueiredo <marfig@gmail.com> |
|---|---|
| Date | 2015-01-25 13:49 +0100 |
| Message-ID | <MPG.2f2f0499338c0a6598969b@nntp.aioe.org> |
| In reply to | #84564 |
In article <54c4dae1$0$13005$c3e8da3$5496439d@news.astraweb.com>, steve+comp.lang.python@pearwood.info says... > [...] Most excellent. Thanks for the hard work, explaining this to me. :) Knowing Python internals is something that will end benefiting me in the long run. There's much to be gained by knowing the inner working of your programming language... Python is missing an under-the-hood book, I suppose. Tracing through Python source code to learn this stuff isn't easy unless we know what we are looking for.
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2015-01-25 14:53 +0200 |
| Message-ID | <87lhkrc97t.fsf@elektro.pacujo.net> |
| In reply to | #84566 |
Mario Figueiredo <marfig@gmail.com>: > Knowing Python internals is something that will end benefiting me in > the long run. There's much to be gained by knowing the inner working > of your programming language... > > Python is missing an under-the-hood book, I suppose. Tracing through > Python source code to learn this stuff isn't easy unless we know what > we are looking for. One must only be careful to distinguish implementation choices from the abstract definitions. The Python Language Reference is the official standard: <URL: https://docs.python.org/3/reference/index.html> Marko
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-01-25 16:35 -0500 |
| Message-ID | <mailman.18135.1422221769.18130.python-list@python.org> |
| In reply to | #84564 |
On 1/25/2015 7:00 AM, Steven D'Aprano wrote:
> What happens inside the dictionary? Dictionaries are "hash tables", so they
> are basically a big array of cells, and each cell is a pair of pointers,
> one for the key and one for the value:
>
> [dictionary header]
> [blank]
> [blank]
> [ptr to the string 'y', ptr to the int 42]
At the moment, for CPython, each entry has 3 items, with the first being
the cached hash of the key. Hash comparison is first used to test
whether keys are equal.
[hash('y'), ptr('y'), ptr(42)]
> [blank]
> [ptr to 'x', ptr to 23]
> [blank]
> [blank]
> [blank]
> [ptr to 'colour', ptr to 'red']
> [blank]
As you say, these are implementation details. CPython dicts for the
instances of at least some classes have a different, specialized
structure, with two arrays.
In the above, [blank] entries, which are about 1/2 to 2/3 of the
entries, take the same space as real entries (12 to 24 bytes). Raymond
H. has proposed that the standard dict have two arrays like so:
1. the first array is a sparse array of indexes into the second array:
[b, b, 2, b, 0, b, b, b, 1, b] (where b might be -1 interpreted as
<blank>), using only as many bytes as needed for the maximum index.
2. the second array is a compact array of entries in insertion order,
such as
[hash, ptr to 'x', ptr to 23]
[hash, ptr to 'colour', ptr to 'red']
[hash, ptr to the string 'y', ptr to the int 42]
Iteration would use the compact array, making all dicts OrderedDicts.
Pypy has already switched to this. It seems that on modern processors
with multilevel on-chip caches, the space reduction leads to cache-miss
reductions that compensate for the indirection cost.
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-01-25 19:21 -0700 |
| Message-ID | <mailman.18139.1422238882.18130.python-list@python.org> |
| In reply to | #84564 |
[Multipart message — attachments visible in raw view] — view raw
On Jan 25, 2015 2:37 PM, "Terry Reedy" <tjreedy@udel.edu> wrote: > 2. the second array is a compact array of entries in insertion order, such as > > [hash, ptr to 'x', ptr to 23] > [hash, ptr to 'colour', ptr to 'red'] > [hash, ptr to the string 'y', ptr to the int 42] > > Iteration would use the compact array, making all dicts OrderedDicts. Pypy has already switched to this. It seems that on modern processors with multilevel on-chip caches, the space reduction leads to cache-miss reductions that compensate for the indirection cost. Deletion becomes O(n) though. Has there been any investigation into how commonly deletion of keys is done?
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web