Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #32577 > unrolled thread
| Started by | Andrew Robinson <andrew3@r3dsolutions.com> |
|---|---|
| First post | 2012-11-01 15:25 -0700 |
| Last post | 2012-11-02 10:48 +0000 |
| Articles | 4 — 3 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Negative array indicies and slice() Andrew Robinson <andrew3@r3dsolutions.com> - 2012-11-01 15:25 -0700
Re: Negative array indicies and slice() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-02 02:14 +0000
Re: Negative array indicies and slice() Andrew Robinson <andrew3@r3dsolutions.com> - 2012-11-02 01:57 -0700
Re: Negative array indicies and slice() Robert Kern <robert.kern@gmail.com> - 2012-11-02 10:48 +0000
| From | Andrew Robinson <andrew3@r3dsolutions.com> |
|---|---|
| Date | 2012-11-01 15:25 -0700 |
| Subject | Re: Negative array indicies and slice() |
| Message-ID | <mailman.3170.1351808865.27098.python-list@python.org> |
On 11/01/2012 12:07 PM, Ian Kelly wrote: > On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson > <andrew3@r3dsolutions.com> wrote: >> Hmmmm.... was that PEP the active state of Python, when Tim rejected the bug report? > Yes. The PEP was accepted and committed in March 2006 for release in > Python 2.5. The bug report is from June 2006 has a version > classification of Python 2.5, although 2.5 was not actually released > until September 2006. That explain's Peter's remark. Thank you. He looks *much* smarter now. > >> Pep 357 merely added cruft with index(), but really solved nothing. Everything index() does could be implemented in __getitem__ and usually is. > No. There is a significant difference between implementing this on > the container versus implementing it on the indexes. Ethan > implemented his string-based slicing on the container, because the > behavior he wanted was specific to the container type, not the index > type. Custom index types like numpy integers on the other hand > implement __index__ on the index type, because they apply to all > sequences, not specific containers. Hmmm... D'Aprano didn't like the monkey patch;and sub-classing was his fix-all. Part of my summary is based on that conversation with him,and you touched on one of the unfinished points; I responded to him that I thought __getitem__ was under-developed. The object slice() has no knowledge of the size of the sequence; nor can it get that size on it's own, but must passively wait for it to be given to it. The bottom line is: __getitem__ must always *PASS* len( seq ) to slice() each *time* the slice() object is-used. Since this is the case, it would have been better to have list, itself, have a default member which takes the raw slice indicies and does the conversion itself. The size would not need to be duplicated or passed -- memory savings, & speed savings... I'm just clay pidgeoning an idea out here.... Let's apply D'Aprano 's logic to numpy; Numpy could just have subclassed *list*; so let's ignore pure python as a reason to do anything on the behalf on Numpy: Then, lets' consider all thrid party classes; These are where subclassing becomes a pain -- BUT: I think those could all have been injected. >>> class ThirdParty( list ): # Pretend this is someone else's... ... def __init__(self): return ... def __getitem__(self,aSlice): return aSlice ... We know it will default work like this: >>> a=ThirdParty() >>> a[1:2] slice(1, 2, None) # So, here's an injection... >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__ >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3, self.superOnlyOfNumpy__getitem__(aSlice ).step ) >>> a[5:6] (1, 3, None) Numpy could have exported a (workable) function that would modify other list functions to affect ONLY numpy data types (eg: a filter). This allows user's creating their own classes to inject them with Numpy's filter only when they desire; Recall Tim Peter's "explicit is better than implicit" Zen? Most importantly normal programs not using Numpy wouldn't have had to carry around an extra API check for index() *every* single time the heavily used [::] happened. Memory & speed both. It's also a monkey patch, in that index() allows *conflicting* assumptions in violation of the unexpected monkey patch interaction worry. eg: Numpy *CAN* release an index() function on their floats -- at which point a basic no touch class (list itself) will now accept float as an index in direct contradiction of PEP 357's comment on floats... see? My point isn't that this particular implementation I have shown is the best (or even really safe, I'd have to think about that for a while). Go ahead and shoot it down... My point is that, the methods found in slice(), and index() now have moved all the code regarding a sequence *out* of the object which has information on that sequence. It smacks of legacy. The Python parser takes values from many other syntactical constructions and passes them directly to their respective objects -- but in the case of list(), we have a complicated relationship; and not for any reason that can't be handled in a simpler way. Don't consider the present API legacy for a moment, I'm asking hypothetical design questions: How many users actually keep slice() around from every instance of [::] they use? If it is rare, why create the slice() object in the first place and constantly be allocating and de-allocating memory, twice over? (once for the original, and once for the repetitive method which computes dynamic values?) Would a single mutable have less overhead, since it is destroyed anyway?
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-11-02 02:14 +0000 |
| Message-ID | <50932c9f$0$29967$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #32577 |
On Thu, 01 Nov 2012 15:25:51 -0700, Andrew Robinson wrote: > On 11/01/2012 12:07 PM, Ian Kelly wrote: >>> Pep 357 merely added cruft with index(), but really solved nothing. >>> Everything index() does could be implemented in __getitem__ and >>> usually is. >> >> No. There is a significant difference between implementing this on the >> container versus implementing it on the indexes. Ethan implemented his >> string-based slicing on the container, because the behavior he wanted >> was specific to the container type, not the index type. Custom index >> types like numpy integers on the other hand implement __index__ on the >> index type, because they apply to all sequences, not specific >> containers. > > Hmmm... > D'Aprano didn't like the monkey patch;and sub-classing was his fix-all. I pointed out that monkey-patching is a bad idea, even if it worked. But it doesn't work -- you simply cannot monkey-patch built-ins in Python. Regardless of whether "I like" the m-p or not, *you can't use it* because you patch built-in list methods. The best you could do is subclass list, then shadow the built-in name "list" with your subclass. But that gives all sorts of problems too, in some ways even worse than monkey-patching. You started this thread with a question about slicing. You believe that one particular use-case for slicing, which involves interpreting lists as circular rather than linear, is the use-case that built-in list slicing should have supported. Fine, you're entitled to your option. But that boat has sailed about 20 years ago. Python didn't make that choice, and it won't change now. If you write up a PEP, you could aim to have the built-in behaviour changed for Python 4 in perhaps another 10-15 years or so. But for the time being, that's not what lists, tuples, strings, etc. do. If you want that behaviour, if you want a circular list, then you have to implement it yourself, and the easiest way to do so is with a subclass. That's not a "fix-all". I certainly don't believe that subclassing is the *only* way to fix this, nor that it will fix "all" things. But it might fix *some* things, such as you wanting a data type that is like a circular list rather than a linear list. If you prefer to create a circular-list class from scratch, re- implementing all the list-like behaviour, instead of inheriting from an existing class, then by all means go right ahead. If you have a good reason to spend days or weeks writing, testing, debugging and fine-tuning your new class, instead of about 15 minutes with a subclass, then I'm certainly not going to tell you not to. > Part of my summary is based on that conversation with him,and you > touched on one of the unfinished points; I responded to him that I > thought __getitem__ was under-developed. The object slice() has no > knowledge of the size of the sequence; nor can it get that size on it's > own, but must passively wait for it to be given to it. That's because the slice object is independent of the sequence. As I demonstrated, you can pass a slice object to multiple sequences. This is a feature, not a bug. > The bottom line is: __getitem__ must always *PASS* len( seq ) to > slice() each *time* the slice() object is-used. The bottom line is: even if you are right, so what? The slice object doesn't know what the length of the sequence is. What makes you think that __getitem__ passes the length to slice()? Why would it need to recreate a slice object that already exists? It is the *sequence*, not the slice object, that is responsible for extracting the appropriate items when __getitem__ is called. __getitem__ gets a slice object as argument, it doesn't create one. It no more creates the slice object than mylist[5] creates the int 5. > Since this is the case, But it isn't. > it would have been better to have list, itself, have a default member > which takes the raw slice indicies and does the conversion itself. The > size would not need to be duplicated or passed -- memory savings, & > speed savings... We have already demonstrated that slice objects are smaller than (x)range objects and three-item tuples. In Python 3.3: py> sys.getsizeof(range(1, 10, 2)) # xrange remained in Python 3 24 py> sys.getsizeof((1, 10, 2)) 36 py> sys.getsizeof(slice(1, 10, 2)) 20 It might help you to be taken seriously if you base your reasoning on Python as it actually is, rather than counter-factual assumptions. > I'm just clay pidgeoning an idea out here.... Let's apply D'Aprano 's > logic to numpy; Numpy could just have subclassed *list*; Sure they could have, if numpy arrays were intended to be a small variation on Python lists. But they weren't, so they didn't. > so let's ignore > pure python as a reason to do anything on the behalf on Numpy: > > Then, lets' consider all thrid party classes; These are where > subclassing becomes a pain -- BUT: I think those could all have been > injected. > > >>> class ThirdParty( list ): # Pretend this is someone else's... > ... def __init__(self): return > ... def __getitem__(self,aSlice): return aSlice > ... Strange and bizarre semantics for slicing, but okay. > We know it will default work like this: > >>> a=ThirdParty() > >>> a[1:2] > slice(1, 2, None) > > # So, here's an injection... > >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__ > >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3, > self.superOnlyOfNumpy__getitem__(aSlice ).step ) > >>> a[5:6] > (1, 3, None) > > Numpy could have exported a (workable) function that would modify other > list functions to affect ONLY numpy data types (eg: a filter). This > allows user's creating their own classes to inject them with Numpy's > filter only when they desire; Sure, the numpy people could have done this, if they were smoking crack. Have you actually programmed before? Judging from the techniques you seem to prefer for everyday use (monkey-patching other classes) and techniques you seem to hate (subclassing), I'm getting the impression you've read about bleeding edge programming hacks but never actually written code. Sort of like somebody who has never driven a car, but fantasises about doing the sort of extreme stunt driving that kills people in real life and occasionally even stunt drivers. And now you are *insisting* that everyone should drive like that, *all the time*, because stopping at traffic lights is so inefficient. Of course, I could be wrong. Maybe you've been programming for years and know exactly what you are doing. But if so, you are coming across as exactly the kind of cowboy coder that I pray to all the gods I never have deal with in real life. [...] > Don't consider the present API legacy for a moment, I'm asking > hypothetical design questions: > > How many users actually keep slice() around from every instance of [::] > they use? Does it matter? It is supported behaviour, so even *one* user is enough. > If it is rare, why create the slice() object in the first place and > constantly be allocating and de-allocating memory, twice over? (once for > the original, and once for the repetitive method which computes dynamic > values?) Huh? As opposed to what? Creating an xrange() object, and constantly allocating and de-allocating memory? Or a tuple? Same again. Some sort of object has to be created. And I have no idea what you are talking about "twice over". What "repetitive method which computers dynamic values"? In any case, I return to my comment earlier in this thread: if you have profiled your application and have hard evidence that creating slice objects is a bottleneck, then we can talk about optimizing the slice objects. Until then, you are wasting your time and ours by prematurely optimizing the wrong parts of your code. > Would a single mutable have less overhead, since it is > destroyed anyway? What? This question makes no sense. Why do you think that mutable objects have "less overhead" than immutable ones? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Andrew Robinson <andrew3@r3dsolutions.com> |
|---|---|
| Date | 2012-11-02 01:57 -0700 |
| Message-ID | <mailman.3185.1351846745.27098.python-list@python.org> |
| In reply to | #32589 |
Hi Ian, I apologize for trying your patience with the badly written code example. All objects were meant to be ThirdParty(), the demo was only to show how a slice() filter could have been applied for the reasons PEP357 made index() to exist. eg: because numpy items passed to __getitems__ via slice syntax [::] were illegal values. PEP 357 is the one who specifically mentioned Numpy types -- which is the only reason I used the name in the example; I could have just as well used a string. I am fully aware of what numpy does -- I have used it; modified the fortran interfaces underneath, etc. The index() method, however, affects *all* list objects in Python, not just Numpy's -- correct? I'll write a working piece of code tomorrow to demonstrate the filter very clearly rather than a skeleton, and test it before posting.
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2012-11-02 10:48 +0000 |
| Message-ID | <mailman.3196.1351853351.27098.python-list@python.org> |
| In reply to | #32589 |
On 11/2/12 8:57 AM, Andrew Robinson wrote: > Hi Ian, > > I apologize for trying your patience with the badly written code example. All > objects were meant to be ThirdParty(), the demo was only to show how a slice() > filter could have been applied for the reasons PEP357 made index() to exist. > eg: because numpy items passed to __getitems__ via slice syntax [::] were > illegal values. > PEP 357 is the one who specifically mentioned Numpy types -- which is the only > reason I used the name in the example; I could have just as well used a string. > > I am fully aware of what numpy does -- I have used it; modified the fortran > interfaces underneath, etc. > > The index() method, however, affects *all* list objects in Python, not just > Numpy's -- correct? Please forget that PEP 357 mentions slices at all. The motivation for the __index__() method (not index()) goes far beyond slices. I'm not really sure why they are given such a prominent place in the PEP. Let me try to lay out the motivation more clearly. numpy has objects that represent integers but cannot be subclasses of the Python int or long objects because their internal representations are different. These are the width-specific types: uint8, int16, int64, etc. Before __index__() was introduced, all indexing operations in the builtin Python sequence types strictly checked for int or long objects and rejected other objects. We wanted to provide a generic method that third party types could implement to say, "Yes, I really am an integer, here is my value in a canonical representation you can understand." We could not use __int__() for this purpose because it has additional semantics, namely conversion from not-integers to integers. This is why floats are mentioned; they do not generally represent integers but they do define an __int__() method for their conversion to ints via the floor() function. Generally, they should be rejected as indices. With the __index__() method, we have a solution: int16 and the rest get __index__() methods and float doesn't. This is used where an integer index or offset is needed, not just in slices. List indices, file.seek(), mmap.mmap(), etc. The change to use PyIndex_Check() instead of PyInt_Check() was not very difficult or extensive. Even if you were to change the slicing API for your other reasons, __index__() would still be needed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web