Groups > comp.lang.python > #90578 > unrolled thread

Building CPython

Started by	BartC <bc@freeuk.com>
First post	2015-05-13 20:36 +0100
Last post	2015-05-17 14:41 +0100
Articles	17 on this page of 57 — 13 participants

Back to article view | Back to comp.lang.python

  Building CPython BartC <bc@freeuk.com> - 2015-05-13 20:36 +0100
    Re: Building CPython Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-13 20:58 +0100
    Re: Building CPython Terry Reedy <tjreedy@udel.edu> - 2015-05-13 18:34 -0400
      Re: Building CPython BartC <bc@freeuk.com> - 2015-05-14 16:51 +0100
        Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 02:09 +1000
          Re: Building CPython BartC <bc@freeuk.com> - 2015-05-14 18:02 +0100
            Re: Building CPython Dave Angel <davea@davea.name> - 2015-05-14 13:10 -0400
            Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 03:11 +1000
              Re: Building CPython BartC <bc@freeuk.com> - 2015-05-14 18:32 +0100
                Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 03:45 +1000
            Re: Building CPython Terry Reedy <tjreedy@udel.edu> - 2015-05-14 14:50 -0400
              Re: Building CPython Christian Gollwitzer <auriocus@gmx.de> - 2015-05-15 12:51 +0200
                Re: Building CPython Terry Reedy <tjreedy@udel.edu> - 2015-05-15 17:19 -0400
        Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-14 19:29 +0300
          Re: Building CPython BartC <bc@freeuk.com> - 2015-05-14 22:55 +0100
            Re: Building CPython MRAB <python@mrabarnett.plus.com> - 2015-05-14 23:19 +0100
            Re: Building CPython BartC <bc@freeuk.com> - 2015-05-15 01:50 +0100
              Re: Building CPython Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-05-15 18:05 +1200
                Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-15 11:59 +0300
                  Re: Building CPython wxjmfauth@gmail.com - 2015-05-15 02:07 -0700
                    Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-15 12:20 +0300
                      Re: Building CPython wxjmfauth@gmail.com - 2015-05-15 02:51 -0700
                        Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-15 13:52 +0300
                          Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-15 22:10 +1000
                            Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 22:34 +1000
                            Re: Building CPython wxjmfauth@gmail.com - 2015-05-15 07:11 -0700
                          Re: Building CPython Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-15 13:41 +0100
                      Re: Building CPython Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-15 13:38 +0100
                  Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 19:43 +1000
                    Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-15 13:50 +0300
                      Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-15 22:43 +1000
                        Re: Building CPython Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-15 09:00 -0600
                        Re: Building CPython Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-15 09:04 -0600
                        Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-16 01:06 +1000
                  Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-15 20:14 +1000
                    Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-15 20:25 +1000
                  Re: Building CPython Terry Reedy <tjreedy@udel.edu> - 2015-05-15 17:05 -0400
                  Re: Building CPython BartC <bc@freeuk.com> - 2015-05-15 22:54 +0100
                    Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-16 01:44 +0300
                      Re: Building CPython Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-16 00:27 +0100
                        Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-16 11:55 +1000
                          Re: Building CPython Chris Angelico <rosuav@gmail.com> - 2015-05-16 12:15 +1000
                          Re: Building CPython Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-16 03:17 +0100
                      Re: Building CPython BartC <bc@freeuk.com> - 2015-05-16 01:43 +0100
                        Re: Building CPython MRAB <python@mrabarnett.plus.com> - 2015-05-16 02:16 +0100
                        Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-16 11:08 +0300
                          Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-16 19:40 +1000
                            Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-16 16:59 +0300
                              Re: Building CPython Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-17 04:18 +1000
                                Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-16 21:55 +0300
                                  Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-17 01:51 +0300
                                    Re: Building CPython Marko Rauhamaa <marko@pacujo.net> - 2015-05-17 23:49 +0300
                    Re: Building CPython Terry Reedy <tjreedy@udel.edu> - 2015-05-15 19:54 -0400
                Re: Building CPython BartC <bc@freeuk.com> - 2015-05-15 10:32 +0100
                  Re: Building CPython Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-05-16 12:55 +1200
                    Re: Building CPython Jonas Wielicki <jonas@wielicki.name> - 2015-05-17 14:25 +0200
                      Re: Building CPython BartC <bc@freeuk.com> - 2015-05-17 14:41 +0100

Page 3 of 3 — ← Prev page 1 2 [3]

#90709

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-16 11:55 +1000
Message-ID	<5556a3a5$0$12989$c3e8da3$5496439d@news.astraweb.com>
In reply to	#90697

On Sat, 16 May 2015 09:27 am, Mark Lawrence wrote:

> On 15/05/2015 23:44, Marko Rauhamaa wrote:
>> BartC <bc@freeuk.com>:
>>
>>> What /is/ a method lookup? Is it when you have this:
>>>
>>>   A.B()
>>>
>>> and need to find whether the expression A (or its class or type) has a
>>> name B associated with it? (And it then needs to check whether B is
>>> something that can be called.)
>>>
>>> If so, does that have to be done using Python's Dict mechanism? (Ie.
>>> searching for a key 'B' by name and seeing if the object associated
>>> with it is a method. That does not sound efficient.)

It's not as inefficient as you may think. Dicts are hash tables, and hash
tables are a standard computer science data structure for performing very
fast searches at constant (or near constant) time.

The dict is basically an array of pointers to (key, value). To look a name
up in the dict, you hash the string which gives you an index into the
array, then look at that position. If it is blank, you know there is no
match. If it points to a string, you compare that string to your string. If
they are equal, then you have a match. If they aren't equal, you have a
collision, and you have to look elsewhere (details differ) but typically
you don't end up looking in more than one or two positions. So all pretty
fast, and close enough to constant time.

To speed things up even further, I think that the hash value is cached with
the string, so it only needs to be calculated the first time.

>> That is a general feature among high-level programming languages. In
>> Python, it is even more complicated:
>>
>>   * first the object's dict is looked up for the method name
>>
>>   * if the method is not found (it usually isn't), the dict of the
>>     object's class is consulted
>>
>>   * if the method is found (it usually is), a function object is
>>     instantiated that delegates to the class's method and embeds a "self"
>>     reference to the object to the call

It's the other way around. The function object already exists: you created
it by writing `def method(self, *args): ... ` inside the class body. def
always makes a function. It's the *method* object which is created on the
fly, delegating to the function.

>> IOW, two dict lookups plus an object construction for each method call.
>>
>>
>> Marko
>>
> 
> As a picture paints a thousand words is anybody aware of a site or sites
> that show this diagramatically, as I think I and possibly others would
> find it far easier to grasp.

No I'm not aware of any such site, but I can try to make it more obvious
with an example.

Suppose we have a hierarchy of classes, starting from the root of the
hierarchy (object) to a specific instance:

class Animal(object): 
    pass

class Mammal(Animal):
    pass

class Dog(Mammal):
    def bark(self): ...

laddy = Dog()

We then look up a method:

laddy.bark()

In a non-dynamic language like Java, the compiler knows exactly where bark
is defined ("in the Dog class") and can call it directly. In dynamic
languages like Python, the compiler can't be sure that bark hasn't been
shadowed or overridden at runtime, so it has to search for the first match
found. Simplified:

* Does laddy.__dict__ contain the key "bark"? If so, we have a match.

* For each class in the MRO (Method Resolution Order), namely 
  [Dog, Mammal, Animal, object], does the class __dict__ contain the
  key "bark"? If so, we have a match.

* Do any of those classes in the MRO have a __getattr__ method? If
  so, then try calling __getattr__("bark") and see if it returns 
  a match.

* Give up and raise AttributeError.

[Aside: some of the things I haven't covered: __slots__, __getattribute__,
how metaclasses can affect this.]

In the case of laddy.bark, the matching attribute is found as
Dog.__dict__['bark']:

    temp = Dog.__bark__['bark']  # laddy.bark, is a function

At this point, the descriptor protocol applies. You can ignore this part if
you like, and just pretend that laddy.bark returns a method instead of a
function, but if you want to know what actually happens in all it's gory
details, it is something like this (again, simplified):

* Does the attribute have a __get__ method? If not, then we just 
  use the object as-is, with no changes.

* But if it does have a __get__ method, then it is a descriptor and
  we call the __get__ method to get the object we actually use.

Since functions are descriptors, we get this:

    temp = temp.__get__(laddy, Dog)  # returns a method object

and finally we can call the method:

    temp()  # laddy.bark()

None of these individual operations are particularly expensive, nor are
there a lot of them. For a typical instance, the MRO usually contains only
two or three classes, and __dict__ lookups are fast. Nevertheless, even
though each method lookup is individually *almost* as fast as the sort of
pre-compiled all-but-instantaneous access which Java can do, it all adds
up. So in Java, a long chain of dots:

    foo.bar.baz.foobar.spam.eggs.cheese

can be resolved at compile-time, and takes no more time than

    foo.cheese

but in Python's case it has to be resolved at run-time, so if you care about
speed, you should try to avoid long chains of dots in performance critical
loops. E.g. instead of:

    for x in sequence:
        foo.bar.baz.foobar.spam.eggs.cheese(x)

you can write:

    cheese = foo.bar.baz.foobar.spam.eggs.cheese
    for x in sequence:
        cheese(x)

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90711

From	Chris Angelico <rosuav@gmail.com>
Date	2015-05-16 12:15 +1000
Message-ID	<mailman.57.1431742547.17265.python-list@python.org>
In reply to	#90709

On Sat, May 16, 2015 at 11:55 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> but in Python's case it has to be resolved at run-time, so if you care about
> speed, you should try to avoid long chains of dots in performance critical
> loops. E.g. instead of:
>
>     for x in sequence:
>         foo.bar.baz.foobar.spam.eggs.cheese(x)
>
> you can write:
>
>     cheese = foo.bar.baz.foobar.spam.eggs.cheese
>     for x in sequence:
>         cheese(x)

Like all advice, of course, this should not be taken on its own. Some
code tries to avoid long chains of dots by adding wrapper methods:

for x in sequence:
    foo.cheese(x)

where foo.cheese() delegates to self.bar.cheese() and so on down the
line. That, of course, will be far FAR slower, in pretty much any
language (unless the compiler can in-line the code completely, in
which case it's effectively being optimized down to the original
anyway); the dots aren't as bad as you might think :)

Just always remember the one cardinal rule of optimization: MEASURE
FIRST. You have no idea how slow something is until you measure it.

(I'm not addressing my comments to Steven here, who I'm fairly sure is
aware of all of this(!), but it's his post that gave the example that
I'm quoting.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#90713

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-05-16 03:17 +0100
Message-ID	<mailman.59.1431742804.17265.python-list@python.org>
In reply to	#90709

On 16/05/2015 02:55, Steven D'Aprano wrote:
> On Sat, 16 May 2015 09:27 am, Mark Lawrence wrote:
>
>> On 15/05/2015 23:44, Marko Rauhamaa wrote:
>>> BartC <bc@freeuk.com>:
>>>
>>>> What /is/ a method lookup? Is it when you have this:
>>>>
>>>>    A.B()
>>>>
>>>> and need to find whether the expression A (or its class or type) has a
>>>> name B associated with it? (And it then needs to check whether B is
>>>> something that can be called.)
>>>>
>>>> If so, does that have to be done using Python's Dict mechanism? (Ie.
>>>> searching for a key 'B' by name and seeing if the object associated
>>>> with it is a method. That does not sound efficient.)
>
> It's not as inefficient as you may think. Dicts are hash tables, and hash
> tables are a standard computer science data structure for performing very
> fast searches at constant (or near constant) time.
>
> The dict is basically an array of pointers to (key, value). To look a name
> up in the dict, you hash the string which gives you an index into the
> array, then look at that position. If it is blank, you know there is no
> match. If it points to a string, you compare that string to your string. If
> they are equal, then you have a match. If they aren't equal, you have a
> collision, and you have to look elsewhere (details differ) but typically
> you don't end up looking in more than one or two positions. So all pretty
> fast, and close enough to constant time.
>
> To speed things up even further, I think that the hash value is cached with
> the string, so it only needs to be calculated the first time.
>
>
>
>>> That is a general feature among high-level programming languages. In
>>> Python, it is even more complicated:
>>>
>>>    * first the object's dict is looked up for the method name
>>>
>>>    * if the method is not found (it usually isn't), the dict of the
>>>      object's class is consulted
>>>
>>>    * if the method is found (it usually is), a function object is
>>>      instantiated that delegates to the class's method and embeds a "self"
>>>      reference to the object to the call
>
> It's the other way around. The function object already exists: you created
> it by writing `def method(self, *args): ... ` inside the class body. def
> always makes a function. It's the *method* object which is created on the
> fly, delegating to the function.
>
>
>
>>> IOW, two dict lookups plus an object construction for each method call.
>>>
>>>
>>> Marko
>>>
>>
>> As a picture paints a thousand words is anybody aware of a site or sites
>> that show this diagramatically, as I think I and possibly others would
>> find it far easier to grasp.
>
> No I'm not aware of any such site, but I can try to make it more obvious
> with an example.
>
> Suppose we have a hierarchy of classes, starting from the root of the
> hierarchy (object) to a specific instance:
>
> class Animal(object):
>      pass
>
> class Mammal(Animal):
>      pass
>
> class Dog(Mammal):
>      def bark(self): ...
>
> laddy = Dog()
>
>
> We then look up a method:
>
> laddy.bark()
>
> In a non-dynamic language like Java, the compiler knows exactly where bark
> is defined ("in the Dog class") and can call it directly. In dynamic
> languages like Python, the compiler can't be sure that bark hasn't been
> shadowed or overridden at runtime, so it has to search for the first match
> found. Simplified:
>
> * Does laddy.__dict__ contain the key "bark"? If so, we have a match.
>
> * For each class in the MRO (Method Resolution Order), namely
>    [Dog, Mammal, Animal, object], does the class __dict__ contain the
>    key "bark"? If so, we have a match.
>
> * Do any of those classes in the MRO have a __getattr__ method? If
>    so, then try calling __getattr__("bark") and see if it returns
>    a match.
>
> * Give up and raise AttributeError.
>
> [Aside: some of the things I haven't covered: __slots__, __getattribute__,
> how metaclasses can affect this.]
>
> In the case of laddy.bark, the matching attribute is found as
> Dog.__dict__['bark']:
>
>      temp = Dog.__bark__['bark']  # laddy.bark, is a function
>
> At this point, the descriptor protocol applies. You can ignore this part if
> you like, and just pretend that laddy.bark returns a method instead of a
> function, but if you want to know what actually happens in all it's gory
> details, it is something like this (again, simplified):
>
> * Does the attribute have a __get__ method? If not, then we just
>    use the object as-is, with no changes.
>
> * But if it does have a __get__ method, then it is a descriptor and
>    we call the __get__ method to get the object we actually use.
>
>
> Since functions are descriptors, we get this:
>
>      temp = temp.__get__(laddy, Dog)  # returns a method object
>
> and finally we can call the method:
>
>      temp()  # laddy.bark()
>
>
> None of these individual operations are particularly expensive, nor are
> there a lot of them. For a typical instance, the MRO usually contains only
> two or three classes, and __dict__ lookups are fast. Nevertheless, even
> though each method lookup is individually *almost* as fast as the sort of
> pre-compiled all-but-instantaneous access which Java can do, it all adds
> up. So in Java, a long chain of dots:
>
>      foo.bar.baz.foobar.spam.eggs.cheese
>
> can be resolved at compile-time, and takes no more time than
>
>      foo.cheese
>
> but in Python's case it has to be resolved at run-time, so if you care about
> speed, you should try to avoid long chains of dots in performance critical
> loops. E.g. instead of:
>
>      for x in sequence:
>          foo.bar.baz.foobar.spam.eggs.cheese(x)
>
> you can write:
>
>      cheese = foo.bar.baz.foobar.spam.eggs.cheese
>      for x in sequence:
>          cheese(x)
>

Thanks for taking the time over this, I'll study it later today after 
hopefully a good night's sleep :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#90701

From	BartC <bc@freeuk.com>
Date	2015-05-16 01:43 +0100
Message-ID	<bpw5x.561870$Zf4.150532@fx22.am4>
In reply to	#90696

On 15/05/2015 23:44, Marko Rauhamaa wrote:
> BartC <bc@freeuk.com>:
>
>> What /is/ a method lookup? Is it when you have this:
>>
>>   A.B()
>>
>> and need to find whether the expression A (or its class or type) has a
>> name B associated with it? (And it then needs to check whether B is
>> something that can be called.)
>>
>> If so, does that have to be done using Python's Dict mechanism? (Ie.
>> searching for a key 'B' by name and seeing if the object associated
>> with it is a method. That does not sound efficient.)
>
> That is a general feature among high-level programming languages. In
> Python, it is even more complicated:
>
>   * first the object's dict is looked up for the method name
>
>   * if the method is not found (it usually isn't), the dict of the
>     object's class is consulted
>
>   * if the method is found (it usually is), a function object is
>     instantiated that delegates to the class's method and embeds a "self"
>     reference to the object to the call
>
> IOW, two dict lookups plus an object construction for each method call.

OK, I didn't know that objects have their own set of attributes that are 
distinct from the class they belong to. I really ought to learn more 
Python!.

(Yet, I have this crazy urge now to create my own bytecode interpreter 
for, if not exactly Python itself, then an equivalent language. Just to 
see if I can do any better than CPython, given the same language 
restraints.

Although I'm hampered a little by not knowing Python well enough. Nor 
OOP, but those are minor details... Anyway it sounds more fun than 
trying to decipher the layers of macros and conditional code that appear 
to be the CPython sources.)

 > IOW, two dict lookups plus an object construction for each method call.

I suppose in many cases an object will have no attributes of its own, 
and so it can rapidly bypass the first lookup. I don't understand the 
need for an object creation (to represent A.B so that it can call it?) 
but perhaps such an object can already exist, prepared ready for use.

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#90704

From	MRAB <python@mrabarnett.plus.com>
Date	2015-05-16 02:16 +0100
Message-ID	<mailman.53.1431739023.17265.python-list@python.org>
In reply to	#90701

On 2015-05-16 01:43, BartC wrote:
> On 15/05/2015 23:44, Marko Rauhamaa wrote:
>> BartC <bc@freeuk.com>:
>>
>>> What /is/ a method lookup? Is it when you have this:
>>>
>>>   A.B()
>>>
>>> and need to find whether the expression A (or its class or type) has a
>>> name B associated with it? (And it then needs to check whether B is
>>> something that can be called.)
>>>
>>> If so, does that have to be done using Python's Dict mechanism? (Ie.
>>> searching for a key 'B' by name and seeing if the object associated
>>> with it is a method. That does not sound efficient.)
>>
>> That is a general feature among high-level programming languages. In
>> Python, it is even more complicated:
>>
>>   * first the object's dict is looked up for the method name
>>
>>   * if the method is not found (it usually isn't), the dict of the
>>     object's class is consulted
>>
>>   * if the method is found (it usually is), a function object is
>>     instantiated that delegates to the class's method and embeds a "self"
>>     reference to the object to the call
>>
>> IOW, two dict lookups plus an object construction for each method call.
>
> OK, I didn't know that objects have their own set of attributes that are
> distinct from the class they belong to. I really ought to learn more
> Python!.
>
> (Yet, I have this crazy urge now to create my own bytecode interpreter
> for, if not exactly Python itself, then an equivalent language. Just to
> see if I can do any better than CPython, given the same language
> restraints.
>
> Although I'm hampered a little by not knowing Python well enough. Nor
> OOP, but those are minor details... Anyway it sounds more fun than
> trying to decipher the layers of macros and conditional code that appear
> to be the CPython sources.)
>
>   > IOW, two dict lookups plus an object construction for each method call.
>
> I suppose in many cases an object will have no attributes of its own,
> and so it can rapidly bypass the first lookup. I don't understand the
> need for an object creation (to represent A.B so that it can call it?)
> but perhaps such an object can already exist, prepared ready for use.
>
It's possible to do:

     f = A.B
     ...
     f()

so it's necessary to have an object for A.B.

The question is how much you would gain from optimising A.B() as a
special case (increase in speed vs increase in complexity).

[toc] | [prev] | [next] | [standalone]

#90726

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-05-16 11:08 +0300
Message-ID	<87egmhrll2.fsf@elektro.pacujo.net>
In reply to	#90701

BartC <bc@freeuk.com>:

> I suppose in many cases an object will have no attributes of its own,
> and so it can rapidly bypass the first lookup.

Almost all objects have quite many instance attributes. That's what
tells objects apart.

> I don't understand the need for an object creation (to represent A.B
> so that it can call it?) but perhaps such an object can already exist,
> prepared ready for use.

Note that almost identical semantics could be achieved without a class.
Thus, these two constructs are almost identical:

    class C:
        def __init__(self, x):
            self.x = x

        def square(self):
            return self.x * self.x

        def cube(self):
            return self.x * self.square()

    ##

    class O: pass

    def C(x):
        o = O()

        def square():
            return x * x

        def cube():
            return x * square()

        o.square = square
        o.cube = cube
        return o


IOW, the class is a virtually superfluous concept in Python. Python has
gotten it probably without much thought (other languages at the time had
it). I comes with advantages and disadvantages:

 + improves readability

 + makes objects slightly smaller

 + makes object instantiation slightly faster

 - goes against the grain of ducktyping

 - makes method calls slower

 - makes method call semantics a bit tricky


Marko

[toc] | [prev] | [next] | [standalone]

#90728

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-16 19:40 +1000
Message-ID	<555710a8$0$13001$c3e8da3$5496439d@news.astraweb.com>
In reply to	#90726

On Sat, 16 May 2015 06:08 pm, Marko Rauhamaa wrote:

> Note that almost identical semantics could be achieved without a class.
> Thus, these two constructs are almost identical:
[...]

> IOW, the class is a virtually superfluous concept in Python. Python has
> gotten it probably without much thought (other languages at the time had
> it). I comes with advantages and disadvantages:

Your example is effectively just a way of using closures instead of a class
instance.

Almost anything you can do with classes, you can do with closures. The big
advantage of classes over closures is that you have an interface to access
arbitrary class attributes, while you would need a separate closure for
each and every attribute you want access to.

For example, here is sketch:

class K:
    def method(self, arg):
        return self.spam + arg

k = K()
the_method = k.method  # bound method

as a closure becomes:

def make_closure(instance):  # instance is equivalent to self above
    def method(arg):
        return instance.spam + arg
    return method

the_method = make_closure(obj)  # Some object with a spam field.

The big advantage of a closure is that you have much more strict
encapsulation. The big disadvantage of a closure is that you have much more
strict encapsulation.

>  + improves readability

I wouldn't say that.

>  + makes objects slightly smaller
> 
>  + makes object instantiation slightly faster

Are you sure? Have you actually benchmarked this?

>  - goes against the grain of ducktyping
> 
>  - makes method calls slower
> 
>  - makes method call semantics a bit tricky

A couple more negatives:

- no such thing as inheritance;

- "is-a" relationship tests don't work;

- an unfamiliar idiom for most people;

Also, at least with Python's implementation, a couple of mixed blessings:

± closures are closed against modification;

± internals of the closure are strictly private;

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90732

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-05-16 16:59 +0300
Message-ID	<878ucosjwh.fsf@elektro.pacujo.net>
In reply to	#90728

Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> A couple more negatives:
>
> - no such thing as inheritance;

Untrue. My simple Scheme object system (125 lines incl. documentation)
supports multiple inheritance without classes. Maybe I should port that
to Python...

> - "is-a" relationship tests don't work;

From the ducktyping point of view, that is an advantage. The whole
Linnaean categorization of objects is unnecessary ontological chaff. How
many times have people here had to advise newcomers not to inspect type
and instance relations of objects and just call the method?

> - an unfamiliar idiom for most people;

That's impossible to ascertain objectively. Java and JavaScript
programmers (of all people!) routinely deal with closures.


Marko

[toc] | [prev] | [next] | [standalone]

#90749

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-17 04:18 +1000
Message-ID	<55578a02$0$13003$c3e8da3$5496439d@news.astraweb.com>
In reply to	#90732

On Sat, 16 May 2015 11:59 pm, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
> 
>> A couple more negatives:
>>
>> - no such thing as inheritance;
> 
> Untrue. My simple Scheme object system (125 lines incl. documentation)

Ah yes, I've seen Javascript code like that too. Each line is thirty
thousand characters long...

*wink*

> supports multiple inheritance without classes. Maybe I should port that
> to Python...

I'd like to see it, but somehow I don't think that your "Scheme object
system" is another name for "closures". We were talking about closures,
weren't we?

It sounds like you have implemented a form of prototype-based object
programming. The question of whether prototype-OOP has inheritance is an
interesting one. Clearly prototypes implement something *like* inheritance,
but it is based in delegation or copying. Delegation-based cloning is quite
close to class-based inheritance, but copying-based cloning is not.

My sense is that I prefer to say that prototypes don't have inheritance in
the same sense as classes, but they have something that plays the same role
as inheritance. But if you want to call it inheritance, I can't really
argue.

>> - "is-a" relationship tests don't work;
> 
> From the ducktyping point of view, that is an advantage. The whole
> Linnaean categorization of objects is unnecessary ontological chaff. How
> many times have people here had to advise newcomers not to inspect type
> and instance relations of objects and just call the method?

Um, is that a trick question? I don't remember the last time.

>> - an unfamiliar idiom for most people;
> 
> That's impossible to ascertain objectively. Java and JavaScript
> programmers (of all people!) routinely deal with closures.

I don't think so. I can't say for Javascript, but for Java, there's a lot of
confusion around closures, they've been described as "evil" with the
recommendation not to use them, and people cannot even agree when they were
introduced!

http://java.dzone.com/articles/whats-wrong-java-8-currying-vs
www.javaworld.com/javaworld/jw-06-2008/jw-06-closures.html

I mean, people had to *debate* the introduction of closures? There were
three competing proposals for them, plus an argument for "don't add them".
Some people say closures were added in Java 7, others say closures have
been there since the beginning, and James Gosling himself says that Java
used inner classes instead of closures but the result was painful...

It seems to me that in the Java community, there's a lot of confusion over
closures, which are seen as an advanced (and rather scary) feature. Hardly
routine.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90750

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-05-16 21:55 +0300
Message-ID	<87zj54qrlw.fsf@elektro.pacujo.net>
In reply to	#90749

Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> On Sat, 16 May 2015 11:59 pm, Marko Rauhamaa wrote:
>> supports multiple inheritance without classes. Maybe I should port that
>> to Python...
>
> I'd like to see it, but somehow I don't think that your "Scheme object
> system" is another name for "closures". We were talking about closures,
> weren't we?

Ok, here's a quick port that have barely tried out:

========================================================================

### Simple OO Framework

class _O: pass

def make_object(*procedures, base=None, bases=None):
    o = _O()
    methods = {}
    setattr(o, '%methods', methods)
    if base is not None:
        inherit_single(o, base)
    elif bases is not None:
        inherit_multi(o, bases)
    for procedure in procedures:
        methods[procedure.__name__] = procedure
        setattr(o, procedure.__name__, procedure)
    return o

def inherit_single(o, base):
    methods = getattr(o, '%methods')
    for name, method in getattr(base, '%methods').items():
        methods[name] = method
        setattr(o, name, method)

def inherit_multi(o, bases):
    for base in bases:
        inherit_single(o, base)

### Used as follows

def TCPClient():
    def connect(socket_address):
        ...
    return make_object(connect)

def SMTPClient():
    tcp_client = TCPClient()
    def connect(host):
        tcp_client.connect((host, 25))
    def send_message(message):
        ...
    return make_object(send_message, base=tcp_client)

client = SMTPClient()
client.connect('mail.example.com')
========================================================================

> I mean, people had to *debate* the introduction of closures? There were
> three competing proposals for them, plus an argument for "don't add them".
> Some people say closures were added in Java 7, others say closures have
> been there since the beginning, and James Gosling himself says that Java
> used inner classes instead of closures but the result was painful...

I'm with those who say anonymous and named inner classes have been there
forever and serve the purpose of closures. Yes, Java's boilerplate
requirements are painful, but if you don't like that, use Python.

> It seems to me that in the Java community, there's a lot of confusion over
> closures, which are seen as an advanced (and rather scary) feature. Hardly
> routine.

Importantly, anonymous inner classes have been in active use by Java
newbs from day one.


Marko

[toc] | [prev] | [next] | [standalone]

#90759

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-05-17 01:51 +0300
Message-ID	<87lhgoqgpv.fsf@elektro.pacujo.net>
In reply to	#90750

Marko Rauhamaa <marko@pacujo.net>:

> Ok, here's a quick port that have barely tried out:

And here's a more complete port (with some possible dunder abuse):

========================================================================
### Simple OO Framework

class _O: pass

def make_object(*procedures, base=None, bases=None):
    o = _O()
    methods = {}
    o.__methods__ = methods
    o.__derived__ = None
    if base is not None:
        _inherit_single(o, base)
    elif bases is not None:
        _inherit_multi(o, bases)
    for procedure in procedures:
        methods[procedure.__name__] = procedure
        def method(*args, __procedure__=procedure, __dispatch__=True, **kwargs):
            if not __dispatch__ or o.__derived__ is None:
                return __procedure__(*args, **kwargs)
            derived = o
            while derived.__derived__ is not None:
                derived = derived.__derived__
            return getattr(derived, __procedure__.__name__)(*args, **kwargs)
        setattr(o, procedure.__name__, method)
    return o

def _inherit_single(o, base):
    methods = o.__methods__
    for name, method in base.__methods__.items():
        methods[name] = method
        setattr(o, name, method)

def _inherit_multi(o, bases):
    for base in bases:
        _inherit_single(o, base)

def delegate(method, *args, **kwargs):
    return method(*args, __dispatch__=False, **kwargs)

### Used as follows

def TCPClient():
    def connect(address):
        pass
    def shut_down():
        pass
    return make_object(connect, shut_down)

def SMTPClient():
    tcp_client = TCPClient()
    def connect(address):
        delegate(tcp_client.connect, address)
        do_stuff()
    def send_message(message):
        pass
    return make_object(connect, send_message, base=tcp_client)

client = SMTPClient()
client.connect(None)
========================================================================


Marko

[toc] | [prev] | [next] | [standalone]

#90787

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-05-17 23:49 +0300
Message-ID	<87mw13orp0.fsf@elektro.pacujo.net>
In reply to	#90759

Iterating. A line still missing:

> ========================================================================
> ### Simple OO Framework
>
> class _O: pass
>
> def make_object(*procedures, base=None, bases=None):
>     o = _O()
>     methods = {}
>     o.__methods__ = methods
>     o.__derived__ = None
>     if base is not None:
>         _inherit_single(o, base)
>     elif bases is not None:
>         _inherit_multi(o, bases)
>     for procedure in procedures:
>         methods[procedure.__name__] = procedure
>         def method(*args, __procedure__=procedure, __dispatch__=True, **kwargs):
>             if not __dispatch__ or o.__derived__ is None:
>                 return __procedure__(*args, **kwargs)
>             derived = o
>             while derived.__derived__ is not None:
>                 derived = derived.__derived__
>             return getattr(derived, __procedure__.__name__)(*args, **kwargs)
>         setattr(o, procedure.__name__, method)
>     return o
>
> def _inherit_single(o, base):
>     methods = o.__methods__
>     for name, method in base.__methods__.items():
>         methods[name] = method
>         setattr(o, name, method)
      base.__derived__ = o
>
> def _inherit_multi(o, bases):
>     for base in bases:
>         _inherit_single(o, base)
>
> def delegate(method, *args, **kwargs):
>     return method(*args, __dispatch__=False, **kwargs)
>
> ### Used as follows
>
> def TCPClient():
>     def connect(address):
>         pass
>     def shut_down():
>         pass
>     return make_object(connect, shut_down)
>
> def SMTPClient():
>     tcp_client = TCPClient()
>     def connect(address):
>         delegate(tcp_client.connect, address)
>         do_stuff()
>     def send_message(message):
>         pass
>     return make_object(connect, send_message, base=tcp_client)
>
> client = SMTPClient()
> client.connect(None)
> ========================================================================

[toc] | [prev] | [next] | [standalone]

#90698

From	Terry Reedy <tjreedy@udel.edu>
Date	2015-05-15 19:54 -0400
Message-ID	<mailman.51.1431734090.17265.python-list@python.org>
In reply to	#90694

On 5/15/2015 5:54 PM, BartC wrote:

> What /is/ a method lookup? Is it when you have this:
>
>   A.B()

This is parsed as (A.B)()

> and need to find whether the expression A (or its class or type) has a
> name B associated with it?

Yes.  Dotted names imply an attribute lookup.

> (And it then needs to check whether B is something that can be called.)

The object resulting from the attribute lookup, A.B (not B exactly), is 
called in a separate operation (with a separate bytecode). It depends on 
the object having a .__call__ method.  The .__call__ method is 
*executed* (rather than *called*, which would lead to infinite regress).

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#90659

From	BartC <bc@freeuk.com>
Date	2015-05-15 10:32 +0100
Message-ID	<Z2j5x.640864$ms.43141@fx30.am4>
In reply to	#90649

On 15/05/2015 07:05, Gregory Ewing wrote:
> BartC wrote:
>> It appears to be those "<=" and "+" operations in the code above where
>> much of the time is spent. When I trace out the execution paths a bit
>> more, I'll have a better picture of how many lines of C code are
>> involved in each iteration.
>
> The path from decoding a bytecode to the C code that
> implements it can be rather convoluted, but there are
> reasons for each of the complications -- mainly to
> do with supporting the ability to override operators
> with C and/or Python code.
>
> If you removed those abilities, the implemention
> would be simpler, and possibly faster. But then the
> language wouldn't be Python any more.

That's the challenge; programs must still work as they did before. (But 
I suppose it can be exasperating for CPython developers if 99% of 
programs could be made faster but can't be because of the 1% that rely 
on certain features).

Still, I'm just seeing what there is in CPython that limits the 
performance, and whether anything can be done /easily/ without going to 
solutions such as PyPy which are unsatisfactory IMO (by being even more 
fantastically complicated, but they don't always give a speed-up either).

For example, there is a /specific/ byte-code called BINARY_ADD, which 
then proceeds to call a /generic/ binary-op handler! This throws away 
the advantage of knowing at byte-code generation time exactly which 
operation is needed.

(Unless I'm just looking at a bunch of macros or maybe there is some 
inlining going on with the compiler reducing all those table-indexing 
operations for 'Add', with direct accesses, but it didn't look like it. 
I'm just glad it doesn't use C++ which would have made it truly 
impossible to figure out what's going on.)

(BTW since I'm having to use Linux to compile this anyway, is there a 
tool available that will tell me whether something in the C sources is a 
function or macro? (And preferably where the definition might be located.))

-- 
Bartc

[toc] | [prev] | [next] | [standalone]

#90702

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-05-16 12:55 +1200
Message-ID	<crnisnF2efoU1@mid.individual.net>
In reply to	#90659

BartC wrote:
> For example, there is a /specific/ byte-code called BINARY_ADD, which 
> then proceeds to call a /generic/ binary-op handler! This throws away 
> the advantage of knowing at byte-code generation time exactly which 
> operation is needed.

While inlining the binary-op handling might give you a
slightly shorter code path, it wouldn't necessarily speed
anything up. It's possible, for example, that the shared
binary-op handler fits in the instruction cache, but the
various inlined copies of it don't, leading to a slowdown.

The only way to be sure about things like that is to try
them and measure. The days when you could predict the speed
of a program just by counting the number of instructions
executed are long gone.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#90768

From	Jonas Wielicki <jonas@wielicki.name>
Date	2015-05-17 14:25 +0200
Message-ID	<mailman.90.1431866130.17265.python-list@python.org>
In reply to	#90702

[Multipart message — attachments visible in raw view] — view raw

On 16.05.2015 02:55, Gregory Ewing wrote:
> BartC wrote:
>> For example, there is a /specific/ byte-code called BINARY_ADD, which
>> then proceeds to call a /generic/ binary-op handler! This throws away
>> the advantage of knowing at byte-code generation time exactly which
>> operation is needed.
> 
> While inlining the binary-op handling might give you a
> slightly shorter code path, it wouldn't necessarily speed
> anything up. It's possible, for example, that the shared
> binary-op handler fits in the instruction cache, but the
> various inlined copies of it don't, leading to a slowdown.
> 
> The only way to be sure about things like that is to try
> them and measure. The days when you could predict the speed
> of a program just by counting the number of instructions
> executed are long gone.

That, and also, the days where you could guess the number of
instructions executed from looking at the code are also gone. Compilers,
and especially C or C++ compilers, are huge beasts with an insane number
of different optimizations which yield pretty impressive results. Not to
mention that they may know the architecture you’re targeting and can
optimize each build for a different architecture; which is not really
possible if you do optimizations which e.g. rely on cache
characteristics or instruction timings or interactions by hand.

I changed my habits to just trust my compiler a few years ago and have
more readable code in exchange for that. The compiler does a fairly
great job, although gcc still outruns clang for *my* usecases. YMMV.

regards,
jwi

[toc] | [prev] | [next] | [standalone]

#90769

From	BartC <bc@freeuk.com>
Date	2015-05-17 14:41 +0100
Message-ID	<YT06x.482260$_r1.466432@fx39.am4>
In reply to	#90768

On 17/05/2015 13:25, Jonas Wielicki wrote:
> On 16.05.2015 02:55, Gregory Ewing wrote:
>> BartC wrote:
>>> For example, there is a /specific/ byte-code called BINARY_ADD, which
>>> then proceeds to call a /generic/ binary-op handler! This throws away
>>> the advantage of knowing at byte-code generation time exactly which
>>> operation is needed.
>>
>> While inlining the binary-op handling might give you a
>> slightly shorter code path, it wouldn't necessarily speed
>> anything up. It's possible, for example, that the shared
>> binary-op handler fits in the instruction cache, but the
>> various inlined copies of it don't, leading to a slowdown.
>>
>> The only way to be sure about things like that is to try
>> them and measure. The days when you could predict the speed
>> of a program just by counting the number of instructions
>> executed are long gone.
>
> That, and also, the days where you could guess the number of
> instructions executed from looking at the code are also gone. Compilers,
> and especially C or C++ compilers, are huge beasts with an insane number
> of different optimizations which yield pretty impressive results. Not to
> mention that they may know the architecture you’re targeting and can
> optimize each build for a different architecture; which is not really
> possible if you do optimizations which e.g. rely on cache
> characteristics or instruction timings or interactions by hand.
>
> I changed my habits to just trust my compiler a few years ago and have
> more readable code in exchange for that. The compiler does a fairly
> great job, although gcc still outruns clang for *my* usecases.

> YMMV.

It does. For my interpreter projects, gcc -O3 does a pretty good job.

For running a suite of standard benchmarks ('spectral', 'fannkuch', 
'binary-tree', all that lot) in the bytecode language under test, then 
gcc is 30% faster than my own language/compiler. (And 25% faster than 
clang.)

(In that project, gcc can do a lot of inlining, which doesn't seem to be 
practical in CPython as functions are all over the place.)

However, when I plug in an ASM dispatcher to my version (which tries to 
deal with simple bytecodes or some common object types before passing 
control to the HLL to deal with), then I can get /twice as fast/ as gcc 
-O3. (For real programs the difference is narrower, but usually still 
faster than gcc.)

(This approach I don't think will work with CPython, because there don't 
appear to be any simple cases for ASM to deal with! The ASM dispatcher 
keeps essential globals such as the stack pointer and program counter in 
registers, and uses chained 'threaded' code rather than function calls. 
A proportion of byte-codes need to be handled in this environment, 
otherwise it could actually slow things down, as the switch to/from HLL 
code is expensive.)

-- 
Bartc

[toc] | [prev] | [standalone]

Page 3 of 3 — ← Prev page 1 2 [3]

csiph-web

Building CPython

Contents

#90709

#90711

#90713

#90701

#90704

#90726

#90728

#90732

#90749

#90750

#90759

#90787

#90698

#90659

#90702

#90768

#90769