Groups > comp.lang.python > #32577 > unrolled thread

Re: Negative array indicies and slice()

Started by	Andrew Robinson <andrew3@r3dsolutions.com>
First post	2012-11-01 15:25 -0700
Last post	2012-11-02 10:48 +0000
Articles	4 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Negative array indicies and slice() Andrew Robinson <andrew3@r3dsolutions.com> - 2012-11-01 15:25 -0700
    Re: Negative array indicies and slice() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-02 02:14 +0000
      Re: Negative array indicies and slice() Andrew Robinson <andrew3@r3dsolutions.com> - 2012-11-02 01:57 -0700
      Re: Negative array indicies and slice() Robert Kern <robert.kern@gmail.com> - 2012-11-02 10:48 +0000

#32577 — Re: Negative array indicies and slice()

From	Andrew Robinson <andrew3@r3dsolutions.com>
Date	2012-11-01 15:25 -0700
Subject	Re: Negative array indicies and slice()
Message-ID	<mailman.3170.1351808865.27098.python-list@python.org>

On 11/01/2012 12:07 PM, Ian Kelly wrote:
> On Thu, Nov 1, 2012 at 5:32 AM, Andrew Robinson
> <andrew3@r3dsolutions.com>  wrote:
>> Hmmmm.... was that PEP the active state of Python, when Tim rejected the bug report?
> Yes. The PEP was accepted and committed in March 2006 for release in
> Python 2.5.  The bug report is from June 2006 has a version
> classification of Python 2.5, although 2.5 was not actually released
> until September 2006.
That explain's Peter's remark.  Thank you.  He looks *much* smarter now.

>
>> Pep 357 merely added cruft with index(), but really solved nothing.  Everything index() does could be implemented in __getitem__ and usually is.
> No.  There is a significant difference between implementing this on
> the container versus implementing it on the indexes.  Ethan
> implemented his string-based slicing on the container, because the
> behavior he wanted was specific to the container type, not the index
> type.  Custom index types like numpy integers on the other hand
> implement __index__ on the index type, because they apply to all
> sequences, not specific containers.

Hmmm...
D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

Part of my summary is based on that conversation with him,and you 
touched on one of the unfinished  points; I responded to him that I 
thought __getitem__ was under-developed.   The object slice() has no 
knowledge of the size of the sequence; nor can it get that size on it's 
own, but must passively wait for it to be given to it.

The bottom line is:  __getitem__ must always *PASS* len( seq ) to 
slice() each *time* the slice() object is-used.  Since this is the case, 
it would have been better to have list, itself, have a default member 
which takes the raw slice indicies and does the conversion itself.  The 
size would not need to be duplicated or passed -- memory savings, & 
speed savings...

I'm just clay pidgeoning an idea out here....
Let's apply D'Aprano 's logic to numpy; Numpy could just have subclassed 
*list*; so let's ignore pure python as a reason to do anything on the 
behalf on Numpy:

Then, lets' consider all thrid party classes;  These are where 
subclassing becomes a pain -- BUT: I think those could all have been 
injected.

 >>> class ThirdParty( list ):  # Pretend this is someone else's...
...     def __init__(self): return
...     def __getitem__(self,aSlice): return aSlice
...

We know it will default work like this:
 >>> a=ThirdParty()
 >>> a[1:2]
slice(1, 2, None)

# So, here's an injection...
 >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
 >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3, 
self.superOnlyOfNumpy__getitem__(aSlice ).step )
 >>> a[5:6]
(1, 3, None)

Numpy could have exported a (workable) function that would modify other 
list functions to affect ONLY numpy data types (eg: a filter).  This 
allows user's creating their own classes to inject them with Numpy's 
filter only when they desire;

Recall Tim Peter's "explicit is better than implicit" Zen?

Most importantly normal programs not using Numpy wouldn't have had to 
carry around an extra API check for index() *every* single time the 
heavily used [::] happened.  Memory & speed both.

It's also a monkey patch, in that index() allows *conflicting* 
assumptions in violation of the unexpected monkey patch interaction worry.

eg: Numpy *CAN* release an index() function on their floats -- at which 
point a basic no touch class (list itself) will now accept float as an 
index in direct contradiction of PEP 357's comment on floats... see?

My point isn't that this particular implementation I have shown is the 
best (or even really safe, I'd have to think about that for a while).  
Go ahead and shoot it down...

My point is that, the methods found in slice(), and index() now have 
moved all the code regarding a sequence *out* of the object which has 
information on that sequence.  It smacks of legacy.

The Python parser takes values from many other syntactical constructions 
and passes them directly to their respective objects -- but in the case 
of list(), we have a complicated relationship; and not for any reason 
that can't be handled in a simpler way.

Don't consider the present API legacy for a moment, I'm asking 
hypothetical design questions:

How many users actually keep slice() around from every instance of [::] 
they use?
If it is rare, why create the slice() object in the first place and 
constantly be allocating and de-allocating memory, twice over? (once for 
the original, and once for the repetitive method which computes dynamic 
values?)  Would a single mutable have less overhead, since it is 
destroyed anyway?

[toc] | [next] | [standalone]

#32589

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-11-02 02:14 +0000
Message-ID	<50932c9f$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to	#32577

On Thu, 01 Nov 2012 15:25:51 -0700, Andrew Robinson wrote:

> On 11/01/2012 12:07 PM, Ian Kelly wrote:

>>> Pep 357 merely added cruft with index(), but really solved nothing. 
>>> Everything index() does could be implemented in __getitem__ and
>>> usually is.
>>
>> No.  There is a significant difference between implementing this on the
>> container versus implementing it on the indexes.  Ethan implemented his
>> string-based slicing on the container, because the behavior he wanted
>> was specific to the container type, not the index type.  Custom index
>> types like numpy integers on the other hand implement __index__ on the
>> index type, because they apply to all sequences, not specific
>> containers.
> 
> Hmmm...
> D'Aprano didn't like the monkey patch;and sub-classing was his fix-all.

I pointed out that monkey-patching is a bad idea, even if it worked. But 
it doesn't work -- you simply cannot monkey-patch built-ins in Python. 
Regardless of whether "I like" the m-p or not, *you can't use it* because 
you patch built-in list methods.

The best you could do is subclass list, then shadow the built-in name 
"list" with your subclass. But that gives all sorts of problems too, in 
some ways even worse than monkey-patching.

You started this thread with a question about slicing. You believe that 
one particular use-case for slicing, which involves interpreting lists as 
circular rather than linear, is the use-case that built-in list slicing 
should have supported.

Fine, you're entitled to your option. But that boat has sailed about 20 
years ago. Python didn't make that choice, and it won't change now. If 
you write up a PEP, you could aim to have the built-in behaviour changed 
for Python 4 in perhaps another 10-15 years or so. But for the time 
being, that's not what lists, tuples, strings, etc. do. If you want that 
behaviour, if you want a circular list, then you have to implement it 
yourself, and the easiest way to do so is with a subclass.

That's not a "fix-all". I certainly don't believe that subclassing is the 
*only* way to fix this, nor that it will fix "all" things. But it might 
fix *some* things, such as you wanting a data type that is like a 
circular list rather than a linear list.

If you prefer to create a circular-list class from scratch, re-
implementing all the list-like behaviour, instead of inheriting from an 
existing class, then by all means go right ahead. If you have a good 
reason to spend days or weeks writing, testing, debugging and fine-tuning 
your new class, instead of about 15 minutes with a subclass, then I'm 
certainly not going to tell you not to.

> Part of my summary is based on that conversation with him,and you
> touched on one of the unfinished  points; I responded to him that I
> thought __getitem__ was under-developed.   The object slice() has no
> knowledge of the size of the sequence; nor can it get that size on it's
> own, but must passively wait for it to be given to it.

That's because the slice object is independent of the sequence. As I 
demonstrated, you can pass a slice object to multiple sequences. This is 
a feature, not a bug.

> The bottom line is:  __getitem__ must always *PASS* len( seq ) to
> slice() each *time* the slice() object is-used.

The bottom line is: even if you are right, so what?

The slice object doesn't know what the length of the sequence is. What 
makes you think that __getitem__ passes the length to slice()? Why would 
it need to recreate a slice object that already exists?

It is the *sequence*, not the slice object, that is responsible for 
extracting the appropriate items when __getitem__ is called. __getitem__ 
gets a slice object as argument, it doesn't create one. It no more 
creates the slice object than mylist[5] creates the int 5.

> Since this is the case,

But it isn't.

> it would have been better to have list, itself, have a default member
> which takes the raw slice indicies and does the conversion itself.  The
> size would not need to be duplicated or passed -- memory savings, &
> speed savings...

We have already demonstrated that slice objects are smaller than (x)range 
objects and three-item tuples. In Python 3.3:

py> sys.getsizeof(range(1, 10, 2))  # xrange remained in Python 3
24
py> sys.getsizeof((1, 10, 2))
36
py> sys.getsizeof(slice(1, 10, 2))
20

It might help you to be taken seriously if you base your reasoning on 
Python as it actually is, rather than counter-factual assumptions.

> I'm just clay pidgeoning an idea out here.... Let's apply D'Aprano 's
> logic to numpy; Numpy could just have subclassed *list*; 

Sure they could have, if numpy arrays were intended to be a small 
variation on Python lists. But they weren't, so they didn't.

> so let's ignore
> pure python as a reason to do anything on the behalf on Numpy:
> 
> Then, lets' consider all thrid party classes;  These are where
> subclassing becomes a pain -- BUT: I think those could all have been
> injected.
> 
>  >>> class ThirdParty( list ):  # Pretend this is someone else's...
> ...     def __init__(self): return
> ...     def __getitem__(self,aSlice): return aSlice 
> ...

Strange and bizarre semantics for slicing, but okay.

> We know it will default work like this:
>  >>> a=ThirdParty()
>  >>> a[1:2]
> slice(1, 2, None)
> 
> # So, here's an injection...
>  >>> ThirdParty.superOnlyOfNumpy__getitem__ = MyClass.__getitem__
>  >>> ThirdParty.__getitem__ = lambda self,aSlice: ( 1, 3,
> self.superOnlyOfNumpy__getitem__(aSlice ).step )
>  >>> a[5:6]
> (1, 3, None)
> 
> Numpy could have exported a (workable) function that would modify other
> list functions to affect ONLY numpy data types (eg: a filter).  This
> allows user's creating their own classes to inject them with Numpy's
> filter only when they desire;

Sure, the numpy people could have done this, if they were smoking crack.

Have you actually programmed before? Judging from the techniques you seem 
to prefer for everyday use (monkey-patching other classes) and techniques 
you seem to hate (subclassing), I'm getting the impression you've read 
about bleeding edge programming hacks but never actually written code. 
Sort of like somebody who has never driven a car, but fantasises about 
doing the sort of extreme stunt driving that kills people in real life 
and occasionally even stunt drivers. And now you are *insisting* that 
everyone should drive like that, *all the time*, because stopping at 
traffic lights is so inefficient.

Of course, I could be wrong. Maybe you've been programming for years and 
know exactly what you are doing. But if so, you are coming across as 
exactly the kind of cowboy coder that I pray to all the gods I never have 
deal with in real life.

[...]
> Don't consider the present API legacy for a moment, I'm asking
> hypothetical design questions:
> 
> How many users actually keep slice() around from every instance of [::]
> they use?

Does it matter? It is supported behaviour, so even *one* user is enough.

> If it is rare, why create the slice() object in the first place and
> constantly be allocating and de-allocating memory, twice over? (once for
> the original, and once for the repetitive method which computes dynamic
> values?)

Huh? As opposed to what? Creating an xrange() object, and constantly 
allocating and de-allocating memory? Or a tuple? Same again. Some sort of 
object has to be created.

And I have no idea what you are talking about "twice over". What 
"repetitive method which computers dynamic values"?

In any case, I return to my comment earlier in this thread: if you have 
profiled your application and have hard evidence that creating slice 
objects is a bottleneck, then we can talk about optimizing the slice 
objects. Until then, you are wasting your time and ours by prematurely 
optimizing the wrong parts of your code.

> Would a single mutable have less overhead, since it is
> destroyed anyway?

What? This question makes no sense. Why do you think that mutable objects 
have "less overhead" than immutable ones?

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#32603

From	Andrew Robinson <andrew3@r3dsolutions.com>
Date	2012-11-02 01:57 -0700
Message-ID	<mailman.3185.1351846745.27098.python-list@python.org>
In reply to	#32589

Hi Ian,

I apologize for trying your patience with the badly written code 
example.  All objects were meant to be ThirdParty(), the demo was only 
to show how a slice() filter could have been applied for the reasons 
PEP357 made index() to exist.
eg: because numpy items passed to __getitems__ via slice syntax [::] 
were illegal values.
PEP 357 is the one who specifically mentioned Numpy types -- which is 
the only reason I used the name in the example;  I could have just as 
well used a string.

I am fully aware of what numpy does -- I have used it; modified the 
fortran interfaces underneath, etc.

The index() method, however, affects *all* list objects in Python, not 
just Numpy's -- correct?

I'll write a working piece of code tomorrow to demonstrate the filter 
very clearly rather than a skeleton, and test it before posting.

[toc] | [prev] | [next] | [standalone]

#32615

From	Robert Kern <robert.kern@gmail.com>
Date	2012-11-02 10:48 +0000
Message-ID	<mailman.3196.1351853351.27098.python-list@python.org>
In reply to	#32589

On 11/2/12 8:57 AM, Andrew Robinson wrote:
> Hi Ian,
>
> I apologize for trying your patience with the badly written code example.  All
> objects were meant to be ThirdParty(), the demo was only to show how a slice()
> filter could have been applied for the reasons PEP357 made index() to exist.
> eg: because numpy items passed to __getitems__ via slice syntax [::] were
> illegal values.
> PEP 357 is the one who specifically mentioned Numpy types -- which is the only
> reason I used the name in the example;  I could have just as well used a string.
>
> I am fully aware of what numpy does -- I have used it; modified the fortran
> interfaces underneath, etc.
>
> The index() method, however, affects *all* list objects in Python, not just
> Numpy's -- correct?

Please forget that PEP 357 mentions slices at all. The motivation for the 
__index__() method (not index()) goes far beyond slices. I'm not really sure why 
they are given such a prominent place in the PEP. Let me try to lay out the 
motivation more clearly.

numpy has objects that represent integers but cannot be subclasses of the Python 
int or long objects because their internal representations are different. These 
are the width-specific types: uint8, int16, int64, etc. Before __index__() was 
introduced, all indexing operations in the builtin Python sequence types 
strictly checked for int or long objects and rejected other objects. We wanted 
to provide a generic method that third party types could implement to say, "Yes, 
I really am an integer, here is my value in a canonical representation you can 
understand." We could not use __int__() for this purpose because it has 
additional semantics, namely conversion from not-integers to integers. This is 
why floats are mentioned; they do not generally represent integers but they do 
define an __int__() method for their conversion to ints via the floor() 
function. Generally, they should be rejected as indices. With the __index__() 
method, we have a solution: int16 and the rest get __index__() methods and float 
doesn't.

This is used where an integer index or offset is needed, not just in slices. 
List indices, file.seek(), mmap.mmap(), etc. The change to use PyIndex_Check() 
instead of PyInt_Check() was not very difficult or extensive. Even if you were 
to change the slicing API for your other reasons, __index__() would still be needed.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [standalone]

csiph-web

Re: Negative array indicies and slice()

Contents

#32577 — Re: Negative array indicies and slice()

#32589

#32603

#32615