Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #32599 > unrolled thread

Re: Organisation of python classes and their methods

Started byMartin Hewitson <martinhewitson@mac.com>
First post2012-11-02 09:20 +0100
Last post2012-11-02 11:29 -0700
Articles 6 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Organisation of python classes and their methods Martin Hewitson <martinhewitson@mac.com> - 2012-11-02 09:20 +0100
    Re: Organisation of python classes and their methods Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-11-02 11:49 +0100
      Re: Organisation of python classes and their methods Martin Hewitson <martinhewitson@mac.com> - 2012-11-02 15:47 +0100
    Re: Organisation of python classes and their methods Paul Rubin <no.email@nospam.invalid> - 2012-11-02 10:24 -0700
      Re: Organisation of python classes and their methods Martin Hewitson <martinhewitson@mac.com> - 2012-11-02 18:40 +0100
        Re: Organisation of python classes and their methods Paul Rubin <no.email@nospam.invalid> - 2012-11-02 11:29 -0700

#32599 — Re: Organisation of python classes and their methods

FromMartin Hewitson <martinhewitson@mac.com>
Date2012-11-02 09:20 +0100
SubjectRe: Organisation of python classes and their methods
Message-ID<mailman.3181.1351844460.27098.python-list@python.org>
On 2, Nov, 2012, at 09:00 AM, Peter Otten <__peter__@web.de> wrote:

> Martin Hewitson wrote:
> 
>> Dear list,
>> 
>> I'm relatively new to Python and have googled and googled but haven't
>> found a reasonable answer to this question, so I thought I'd ask it here.
>> 
>> I'm beginning a large Python project which contains many packages, modules
>> and classes. The organisation of those is clear to me.
>> 
>> Now, the classes can contain many methods (100s of data analysis methods)
>> which operate on instances of the class they belong to. These methods can
>> be long and complex. So if I put these methods all in the module file
>> inside the class, the file will get insanely long. Reading on google, the
>> answer is usually "refactor", but that really doesn't make sense here.
>> It's just that the methods are many, and each method can be a long piece
>> of code. So, is there a way to put these methods in their own files and
>> have them 'included' in the class somehow? I read a little about mixins
>> but all the solutions looked very hacky. Is there an official python way
>> to do this? I don't like having source files with 100's of lines of code
>> in, let alone 1000's.
> 
> You googled, found the right answer ("refactor"), didn't like it and are now 
> looking to cure the symptoms of the original problem?
> Seriously, a good editor can deal with a long source file, but a class with 
> hundreds of methods will bring trouble to any old brain.

Well, here we disagree. Suppose I have a class which encapsulates time-series data. Below is a list of the absolute minimum methods one would have to process that data. That's close to 100 already before even having any specialised methods for dealing with the data.  Each of these methods will have maybe 20 lines of documentation. That's 2000 lines already. And what if someone wants to extend that class to add their own processing methods? It would a maintenance nightmare for them to edit the actual class file, which they would then have to repeat each time a new version of the 'official' class file is released.

So maybe some rethinking of this design is needed to handle this 'limitation' of python. Perhaps grouping the processing algorithms into methods of processing classes, then pass the data objects to these methods. But somehow I don't like that. I have the feeling these methods would end up peppered with things like:

if this data type, do this
else if this data type, do this
else ....

normally this would be solved by overloading methods in different data subclasses.

More thinking needed, clearly.

Martin



    'abs'
    'acos'
    'asin'
    'atan'
    'atan2'
    'average'
    'cohere'
    'conv'
    'corr'
    'cos'
    'cov'
    'cpsd'
    'detrend'
    'dft'
    'diff'
    'downsample'
    'exp'
    'export'
    'fft'
    'fftfilt'
    'filter'
    'filtfilt'
    'find'
    'heterodyne'
    'hist'
    'imag'
    'integrate'
    'interp'
    'join'
    'le'
    'lincom'
    'ln'
    'load'
    'log'
    'log10'
    'lscov'
    'max'
    'mean'
    'median'
    'min'
    'minus'
    'mode'
    'mpower'
    'mrdivide'
    'mtimes'
    'ne'
    'norm'
    'or'
    'plot'
    'plus'
    'polyfit'
    'power'
    'psd'
    'rdivide'
    'real'
    'resample'
    'rms'
    'round'
    'save'
    'scale'
    'search'
    'select'
    'sin'
    'smoother'
    'sort'
    'spectrogram'
    'split'
    'sqrt'
    'std'
    'sum'
    'sumjoin'
    'svd'
    'tan'
    'tfe'
    'timeaverage'
    'times'
    'timeshift'
    'transpose'
    'uminus'
    'upsample'
    'zeropad'

[toc] | [next] | [standalone]


#32621

FromUlrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date2012-11-02 11:49 +0100
Message-ID<62lcm9-4t5.ln1@satorlaser.homedns.org>
In reply to#32599
Am 02.11.2012 09:20, schrieb Martin Hewitson:
> Well, here we disagree. Suppose I have a class which encapsulates
> time-series data. Below is a list of the absolute minimum methods one
> would have to process that data.
[...]
 > 'abs' 'acos' 'asin' 'atan' 'atan2' 'average' 'cohere' 'conv' 'corr'
 > 'cos' 'cov' 'cpsd' 'detrend' 'dft' 'diff' 'downsample' 'exp'
 > 'export' 'fft' 'fftfilt' 'filter' 'filtfilt' 'find' 'heterodyne'
 > 'hist' 'imag' 'integrate' 'interp' 'join' 'le' 'lincom' 'ln' 'load'
 > 'log' 'log10' 'lscov' 'max' 'mean' 'median' 'min' 'minus' 'mode'
 > 'mpower' 'mrdivide' 'mtimes' 'ne' 'norm' 'or' 'plot' 'plus'
 > 'polyfit' 'power' 'psd' 'rdivide' 'real' 'resample' 'rms' 'round'
 > 'save' 'scale' 'search' 'select' 'sin' 'smoother' 'sort'
 > 'spectrogram' 'split' 'sqrt' 'std' 'sum' 'sumjoin' 'svd' 'tan' 'tfe'
 > 'timeaverage' 'times' 'timeshift' 'transpose' 'uminus' 'upsample'
 > 'zeropad'


Just as a suggestion, you can separate these into categories:

1. Things that modify the data, yielding a different (although derived) 
data set, e.g. import/load, split, join, plus, minus, zeropad.
2. Things that operate on the data without modifying it, e.g. 
export/save, average, find, plot, integrate.

The latter can easily be removed from the class. Since they don't touch 
the content, they can't invalidate internals and can't break encapsulation.

For the former, providing general means to construct or modify the data 
(like e.g. adding records or joining sequences) is also all that needs 
to remain inside the class to ensure internal consistency, everything 
else can be built on top of these using external functions.


Uli


[toc] | [prev] | [next] | [standalone]


#32633

FromMartin Hewitson <martinhewitson@mac.com>
Date2012-11-02 15:47 +0100
Message-ID<mailman.3207.1351867667.27098.python-list@python.org>
In reply to#32621
On 2, Nov, 2012, at 11:49 AM, Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> wrote:

> Am 02.11.2012 09:20, schrieb Martin Hewitson:
>> Well, here we disagree. Suppose I have a class which encapsulates
>> time-series data. Below is a list of the absolute minimum methods one
>> would have to process that data.
> [...]
> > 'abs' 'acos' 'asin' 'atan' 'atan2' 'average' 'cohere' 'conv' 'corr'
> > 'cos' 'cov' 'cpsd' 'detrend' 'dft' 'diff' 'downsample' 'exp'
> > 'export' 'fft' 'fftfilt' 'filter' 'filtfilt' 'find' 'heterodyne'
> > 'hist' 'imag' 'integrate' 'interp' 'join' 'le' 'lincom' 'ln' 'load'
> > 'log' 'log10' 'lscov' 'max' 'mean' 'median' 'min' 'minus' 'mode'
> > 'mpower' 'mrdivide' 'mtimes' 'ne' 'norm' 'or' 'plot' 'plus'
> > 'polyfit' 'power' 'psd' 'rdivide' 'real' 'resample' 'rms' 'round'
> > 'save' 'scale' 'search' 'select' 'sin' 'smoother' 'sort'
> > 'spectrogram' 'split' 'sqrt' 'std' 'sum' 'sumjoin' 'svd' 'tan' 'tfe'
> > 'timeaverage' 'times' 'timeshift' 'transpose' 'uminus' 'upsample'
> > 'zeropad'
> 
> 
> Just as a suggestion, you can separate these into categories:
> 
> 1. Things that modify the data, yielding a different (although derived) data set, e.g. import/load, split, join, plus, minus, zeropad.
> 2. Things that operate on the data without modifying it, e.g. export/save, average, find, plot, integrate.
> 
> The latter can easily be removed from the class. Since they don't touch the content, they can't invalidate internals and can't break encapsulation.
> 
> For the former, providing general means to construct or modify the data (like e.g. adding records or joining sequences) is also all that needs to remain inside the class to ensure internal consistency, everything else can be built on top of these using external functions.
> 

Thank you all so much for your thoughts and suggestions. I need to absorb all of this and decide on the best approach in this case.

Thanks again,

Martin

> 
> Uli
> 
> 
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#32639

FromPaul Rubin <no.email@nospam.invalid>
Date2012-11-02 10:24 -0700
Message-ID<7xhap7dijb.fsf@ruckus.brouhaha.com>
In reply to#32599
Martin Hewitson <martinhewitson@mac.com> writes:
> Well, here we disagree. Suppose I have a class which encapsulates
> time-series data. Below is a list of the absolute minimum methods one
> would have to process that data. ... 
>     'abs'
>     'acos'
>     'asin'
> ...

Ok, THERE is your problem.  Why do you have separate implementations of
all those functions?  Does the abs of a time series simply mean the abs
of each element of the series?  In that case you want just ONE method,
something like "map", which applies an arbitrary function to all
elements of the series.  Then for time series ts, instead of saying
ts.abs(), you'd say ts.map(abs) where abs is the existing, built-in
absolute value function.  You could similarly say ts.map(acos) etc.
That gets rid of almost all of those methods.

[toc] | [prev] | [next] | [standalone]


#32642

FromMartin Hewitson <martinhewitson@mac.com>
Date2012-11-02 18:40 +0100
Message-ID<mailman.3213.1351878057.27098.python-list@python.org>
In reply to#32639
On 2, Nov, 2012, at 06:24 PM, Paul Rubin <no.email@nospam.invalid> wrote:

> Martin Hewitson <martinhewitson@mac.com> writes:
>> Well, here we disagree. Suppose I have a class which encapsulates
>> time-series data. Below is a list of the absolute minimum methods one
>> would have to process that data. ... 
>>   'abs'
>>   'acos'
>>   'asin'
>> ...
> 
> Ok, THERE is your problem.  Why do you have separate implementations of
> all those functions?  Does the abs of a time series simply mean the abs
> of each element of the series?  In that case you want just ONE method,
> something like "map", which applies an arbitrary function to all
> elements of the series.  Then for time series ts, instead of saying
> ts.abs(), you'd say ts.map(abs) where abs is the existing, built-in
> absolute value function.  You could similarly say ts.map(acos) etc.
> That gets rid of almost all of those methods.

Well, because one of the features that the framework will have is to capture history steps (in a tree structure) so that each processing step the user does is tracked. So while methods such as abs(), cos(), etc will eventually just call a built-in method, there will be some house-keeping around them. All that said, as I've been trying to implement this structure, it turns out that in Python, this is more naturally achieved (I think) if each algorithm is implemented as a class, so that each algorithm can return its set of supported parameters for validation against the user inputs and, ultimately, for inclusion in a step in the history tree. Since most of that infrastructure will turn out to be boiler-plate code, it would make sense to have an algorithm base class, which all other algorithms (abs, cos, etc) will inherit from. Then I just need to get my head around the interplay between these algorithm classes and the data classes. Some more prototyping needed.

Thanks for the info about map(); this will likely turn out to be very useful, if not at the user level, at least within the framework. Again, a main requirement is that the users should be able to work without knowing much about Python or programming in general; they just get this toolkit and, after minimal training should be able to do most of what they want in an intuitive way.

Cheers,

Martin



> -- 
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#32647

FromPaul Rubin <no.email@nospam.invalid>
Date2012-11-02 11:29 -0700
Message-ID<7xsj8r97v6.fsf@ruckus.brouhaha.com>
In reply to#32642
Martin Hewitson <martinhewitson@mac.com> writes:
>> you want just ONE method, something like "map"...
> Well, because one of the features that the framework will have is to
> capture history steps (in a tree structure) so that each processing
> step the user does is tracked. So while methods such as abs(), cos(),
> etc will eventually just call a built-in method, there will be some
> house-keeping around them. 

Make the "map" wrapper do the house-keeping.

> turns out that in Python, this is more naturally achieved (I think) if
> each algorithm is implemented as a class, so that each algorithm can
> return its set of supported parameters for validation against the user
> inputs and, ultimately, for inclusion in a step in the history
> tree. Since most of that infrastructure will turn out to be
> boiler-plate code, it would make sense to have an algorithm base
> class, which all other algorithms (abs, cos, etc) will inherit
> from. 

That sounds like over-use of classes and inheritance.  It's probably
easiest to just use a dictionary with functions in it (untested):

    from collections import namedtuple
    Ts_func = namedtuple('Ts_func', ['name', func, 'doc', 'validate'])

    all_funcs = {}
    def add_func(name, func, doc, validate):
       all_funcs[name] = Ts_func(name, func, doc, validate)

    add_func('abs', abs, 'absolute value', lambda x: True)
    add_func('acos', math.acos, 'arc cosine', lambda x: abs(x) <= 1)
    ...

You can then look up any of these entries and pass it to your map method.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web