Groups > comp.lang.python > #64033 > unrolled thread

Re: data validation when creating an object

Started by	Ben Finney <ben+python@benfinney.id.au>
First post	2014-01-16 12:16 +1100
Last post	2014-01-16 10:44 -0600
Articles	8 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: data validation when creating an object Ben Finney <ben+python@benfinney.id.au> - 2014-01-16 12:16 +1100
    Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-15 23:05 -0500
      Re: data validation when creating an object Ben Finney <ben+python@benfinney.id.au> - 2014-01-16 15:53 +1100
        Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-16 00:05 -0500
      Re: data validation when creating an object Robert Kern <robert.kern@gmail.com> - 2014-01-16 15:46 +0000
        Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-16 08:18 -0800
          Re: data validation when creating an object Robert Kern <robert.kern@gmail.com> - 2014-01-16 16:58 +0000
      Re: data validation when creating an object Skip Montanaro <skip@pobox.com> - 2014-01-16 10:44 -0600

#64033 — Re: data validation when creating an object

From	Ben Finney <ben+python@benfinney.id.au>
Date	2014-01-16 12:16 +1100
Subject	Re: data validation when creating an object
Message-ID	<mailman.5555.1389834993.18130.python-list@python.org>

Rita <rmorgan466@gmail.com> writes:

> I would like to do some data validation when its going to a class.
>
> class Foo(object):
>   def __init__(self):
>     pass
>
> I know its frowned upon to do work in the __init__() method and only
> declarations should be there.

Who says it's frowned on to do work in the initialiser? Where are they
saying it? That seems over-broad, I'd like to read the context of that
advice.

> So, should i create a function called validateData(self) inside foo?

If you're going to create it, ‘validate_data’ would be a better name
(because it's PEP 8 conformant).

> I would call the object like this
>
> x=Foo()
> x.validateData()

You should also be surrounding the “=” operator with spaces (PEP 8
again) for readability.

> Is this the preferred way? Is there a way I can run validateData()
> automatically, maybe put it in __init__?

It depends entirely on what is being done in those functions.

But in general, we tend not to write our functions small enough or
focussed enough. So general advice would be that, if you think the
function is going to be too long and/or doing too much, you're probably
right :-)

-- 
 \         “Nature hath given men one tongue but two ears, that we may |
  `\          hear from others twice as much as we speak.” —Epictetus, |
_o__)                                                      _Fragments_ |
Ben Finney

[toc] | [next] | [standalone]

#64048

From	Roy Smith <roy@panix.com>
Date	2014-01-15 23:05 -0500
Message-ID	<roy-BA09BB.23054615012014@news.panix.com>
In reply to	#64033

Rita <rmorgan466@gmail.com> writes:
>> I know its frowned upon to do work in the __init__() method and only
>> declarations should be there.

In article <mailman.5555.1389834993.18130.python-list@python.org>,
 Ben Finney <ben+python@benfinney.id.au> wrote:

> Who says it's frowned on to do work in the initialiser? Where are they
> saying it? That seems over-broad, I'd like to read the context of that
> advice.

Weird, I was just having this conversation at work earlier this week.

There are some people who advocate that C++ constructors should not do a 
lot of work and/or should be incapable of throwing exceptions.  The pros 
and cons of that argument are largely C++ specific.  Here's a Stack 
Overflow thread which covers most of the usual arguments on both sides:

http://stackoverflow.com/questions/293967/how-much-work-should-be-done-in
-a-constructor

But, Python is not C++.  I suspect the people who argue for __init__() 
not doing much are extrapolating a C++ pattern to other languages 
without fully understanding the reason why.

That being said, I've been on a tear lately, trying to get our unit test 
suite to run faster.  I came across one slow test which had an 
interesting twist.  The class being tested had an __init__() method 
which read over 900,000 records from a database and took something like 
5-10 seconds to run.  Man, talk about heavy-weight constructors :-)

[toc] | [prev] | [next] | [standalone]

#64050

From	Ben Finney <ben+python@benfinney.id.au>
Date	2014-01-16 15:53 +1100
Message-ID	<mailman.5567.1389848051.18130.python-list@python.org>
In reply to	#64048

Roy Smith <roy@panix.com> writes:

>  Ben Finney <ben+python@benfinney.id.au> wrote:
>
> > Who says it's frowned on to do work in the initialiser? Where are they
> > saying it? That seems over-broad, I'd like to read the context of that
> > advice.
>
> There are some people who advocate that C++ constructors should not do
> a lot of work and/or should be incapable of throwing exceptions. The
> pros and cons of that argument are largely C++ specific. […]
>
> But, Python is not C++. I suspect the people who argue for __init__()
> not doing much are extrapolating a C++ pattern to other languages
> without fully understanding the reason why.

Even simpler: They are mistaken in what the constructor is named, in
Python.

Python classes have the constructor, ‘__new__’. I would agree with
advice not to do anything but allocate the resources for a new instance
in the constructor.

Indeed, the constructor from ‘object’ does a good enough job that the
vast majority of Python classes never need a custom constructor at all.

(This is probably why many beginning programmers are confused about what
the constructor is called: They've never seen a class with its own
constructor!)

Python instances have an initialiser, ‘__init__’. That function is for
setting up the specific instance for later use. This is commonly
over-ridden and many classes define a custom initialiser, which normally
does some amount of work.

I don't think ‘__init__’ is subject to the conventions of a constructor,
because *‘__init__’ is not a constructor*.

-- 
 \        “Absurdity, n. A statement or belief manifestly inconsistent |
  `\            with one's own opinion.” —Ambrose Bierce, _The Devil's |
_o__)                                                Dictionary_, 1906 |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#64053

From	Roy Smith <roy@panix.com>
Date	2014-01-16 00:05 -0500
Message-ID	<roy-FED514.00050716012014@news.panix.com>
In reply to	#64050

In article <mailman.5567.1389848051.18130.python-list@python.org>,
 Ben Finney <ben+python@benfinney.id.au> wrote:

> Roy Smith <roy@panix.com> writes:
> > But, Python is not C++. I suspect the people who argue for __init__()
> > not doing much are extrapolating a C++ pattern to other languages
> > without fully understanding the reason why.
> 
> Even simpler: They are mistaken in what the constructor is named, in
> Python.
> 
> Python classes have the constructor, ‘__new__’. I would agree with
> advice not to do anything but allocate the resources for a new instance
> in the constructor.

I've always found this distinction to be somewhat silly.

C++ constructors are also really just initializers.  Before your 
constructor is called, something else (operator new, at least for 
objects in the heap) has already allocated memory for the object.  It's 
the constructor's job to initialize the data.  That's really very much 
the same distinction as between __new__() and __init__().

[toc] | [prev] | [next] | [standalone]

#64086

From	Robert Kern <robert.kern@gmail.com>
Date	2014-01-16 15:46 +0000
Message-ID	<mailman.5590.1389887225.18130.python-list@python.org>
In reply to	#64048

On 2014-01-16 04:05, Roy Smith wrote:
> Rita <rmorgan466@gmail.com> writes:
>>> I know its frowned upon to do work in the __init__() method and only
>>> declarations should be there.
>
>
> In article <mailman.5555.1389834993.18130.python-list@python.org>,
>   Ben Finney <ben+python@benfinney.id.au> wrote:
>
>> Who says it's frowned on to do work in the initialiser? Where are they
>> saying it? That seems over-broad, I'd like to read the context of that
>> advice.
>
> Weird, I was just having this conversation at work earlier this week.
>
> There are some people who advocate that C++ constructors should not do a
> lot of work and/or should be incapable of throwing exceptions.  The pros
> and cons of that argument are largely C++ specific.  Here's a Stack
> Overflow thread which covers most of the usual arguments on both sides:
>
> http://stackoverflow.com/questions/293967/how-much-work-should-be-done-in
> -a-constructor
>
> But, Python is not C++.  I suspect the people who argue for __init__()
> not doing much are extrapolating a C++ pattern to other languages
> without fully understanding the reason why.

I'm one of those people who tends to argue this, but my limited experience with 
C++ does not inform my opinion one way or the other.

I prefer to keep my __init__() methods as dumb as possible to retain the 
flexibility to construct my objects in different ways. Sure, it's convenient to, 
say, pass a filename and have the __init__() open() it for me. But then I'm 
stuck with only being able to create this object with a true, named file on 
disk. I can't create it with a StringIO for testing, or by opening a file and 
seeking to a specific spot where the relevant data starts, etc. I can keep the 
flexibility and convenience by keeping __init__() dumb and relegating various 
smarter and more convenient ways to instantiate the object to classmethods.

Which isn't to say that "smart" or "heavy" __init__()s don't have their place 
for some kinds of objects. I just think that dumb __init__()s should be the default.

That said, what the OP asks about, validating data in the __init__() is 
perfectly fine, IMO. My beef isn't so much with the raw *amount* of stuff done 
but how much you can code yourself into a corner by making limiting assumptions. 
So from one of the "do nothing in your __init__()" crowd, I say "well, I didn't 
really mean *nothing*...."

> That being said, I've been on a tear lately, trying to get our unit test
> suite to run faster.  I came across one slow test which had an
> interesting twist.  The class being tested had an __init__() method
> which read over 900,000 records from a database and took something like
> 5-10 seconds to run.  Man, talk about heavy-weight constructors :-)

Indeed.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]

#64087

From	Roy Smith <roy@panix.com>
Date	2014-01-16 08:18 -0800
Message-ID	<6a2d342e-78b6-45b5-a8b4-9c5767edc8b1@googlegroups.com>
In reply to	#64086

On Thursday, January 16, 2014 10:46:10 AM UTC-5, Robert Kern wrote:

> I prefer to keep my __init__() methods as dumb as possible to retain the 
> flexibility to construct my objects in different ways. Sure, it's convenient to, 
> say, pass a filename and have the __init__() open() it for me. But then I'm 
> stuck with only being able to create this object with a true, named file on 
> disk. I can't create it with a StringIO for testing, or by opening a file and 
> seeking to a specific spot where the relevant data starts, etc. I can keep the 
> flexibility and convenience by keeping __init__() dumb and relegating various 
> smarter and more convenient ways to instantiate the object to classmethods.

There's two distinct things being discussed here.

The idea of passing a file-like object vs. a filename gives you flexibility, that's for sure.  But, that's orthogonal to how much work should be done in the constructor.  Consider this class:

class DataSlurper:
    def __init__(self):
        self.slurpee = None

    def attach_slurpee(self, slurpee):
        self.slurpee = slurpee

    def slurp(self):
        for line in self.slurpee:
            # whatever

This exhibits the nice behavior you describe; you can pass it any iterable, not just a file, so you have a lot more flexibility.  But, it's also exhibiting what many people call the "two-phase constructor" anti-pattern.  When you construct an instance of this class, it's not usable until you call attach_slurpee(), so why not just do that in the constructor?

[toc] | [prev] | [next] | [standalone]

#64091

From	Robert Kern <robert.kern@gmail.com>
Date	2014-01-16 16:58 +0000
Message-ID	<mailman.5592.1389891508.18130.python-list@python.org>
In reply to	#64087

On 2014-01-16 16:18, Roy Smith wrote:
> On Thursday, January 16, 2014 10:46:10 AM UTC-5, Robert Kern wrote:
>
>> I prefer to keep my __init__() methods as dumb as possible to retain the
>> flexibility to construct my objects in different ways. Sure, it's convenient to,
>> say, pass a filename and have the __init__() open() it for me. But then I'm
>> stuck with only being able to create this object with a true, named file on
>> disk. I can't create it with a StringIO for testing, or by opening a file and
>> seeking to a specific spot where the relevant data starts, etc. I can keep the
>> flexibility and convenience by keeping __init__() dumb and relegating various
>> smarter and more convenient ways to instantiate the object to classmethods.
>
> There's two distinct things being discussed here.
>
> The idea of passing a file-like object vs. a filename gives you flexibility, that's for sure.  But, that's orthogonal to how much work should be done in the constructor.  Consider this class:

Where the two get conflated is that both lead to advice that looks the same (or 
at least can get interpreted the same by newbies who are trying to learn and 
don't have the experience to pick out the subtleties): "do nothing in __init__". 
That's why I am trying to clarify where this advice might be coming from and why 
at least one version of it may be valid.

> class DataSlurper:
>      def __init__(self):
>          self.slurpee = None
>
>      def attach_slurpee(self, slurpee):
>          self.slurpee = slurpee
>
>      def slurp(self):
>          for line in self.slurpee:
>              # whatever
>
> This exhibits the nice behavior you describe; you can pass it any iterable, not just a file, so you have a lot more flexibility.  But, it's also exhibiting what many people call the "two-phase constructor" anti-pattern.  When you construct an instance of this class, it's not usable until you call attach_slurpee(), so why not just do that in the constructor?

That's where my recommendation of classmethods come in. The result of __init__() 
should always be usable. It's just that its arguments may not be as convenient 
as you like because you pass in objects that are closer to the internal 
representation than you normally want to deal with (e.g. file objects instead of 
filenames). You make additional constructors (initializers, whatever) as 
classmethods to restore convenience.


class DataSlurper:
   def __init__(self, slurpee):
     self.slurpee = slurpee

   @classmethod
   def fromfile(cls, filename):
     slurpee = open(filename)
     return cls(slurpee)

   @classmethod
   def fromurl(cls, url):
     slurpee = urllib.urlopen(url)
     return cls(slurpee)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]

#64089

From	Skip Montanaro <skip@pobox.com>
Date	2014-01-16 10:44 -0600
Message-ID	<mailman.5591.1389890666.18130.python-list@python.org>
In reply to	#64048

I suspect when best to validate inputs depends on when they
come in, and what the cost is of having objects with invalid
state. If the input is something that is passed along when
the object is instantiated, you kind of have to validate in
__init__ or __new__, right?

Let's create a stupid example:

class Point(object):
    def __init__(self, coordinates):
        self.x, self.y, self.z = coordinates

That's kind of self-validating. If you pass something that
doesn't quack like a three-element sequence, the program
will crash. OTOH, Point("abc") will appear to work until you
expect x, y and z to act like numbers. If you can't tolerate
that, then a __new__ method allows you to validate arguments
before creating a new Point instance. You might also allow
one- or two-element tuples:

    def __new__(cls, coordinates):
        ... convert to tuple, then ...
        ... verify that all elements are numbers, then ...

        if len(coordinates) > 3:
            raise ValueError("Expect 1-, 2-, or 3-element tuple")
        if len(coordinates) < 2:
            coordinates += (0.0,)
        if len(coordinates) < 3:
            coordinates += (0.0,)
	return cls(coordinates)

Validating in __new__ will allow you to catch problems
sooner, and give you the option of returning some sort of
sentinel instead of just raising an exception, though that
is probably not generally good practice. This will catch
Point("abc") quickly, with a stack trace pointing to the
offending code.

Of course, you might need to validate any other inputs which
appear after instantiation:

    def move(self, xdelta=0.0, ydelta=0.0, zdelta=0.0):
        self.x += xdelta
	self.y += ydelta
	self.z += zdelta

Whether you need more feedback than an exception might give
you here is debatable, but there are likely plenty of
situations where you need to explicitly validate user input
before using it (think of accepting string data from the net
which you plan to feed to your SQL database...). Those sorts
of validation steps are beyond the scope of this thread, and
probably much better handled by platforms further up the
software stack (like Django or SQLAlchemy).

Skip

[toc] | [prev] | [standalone]

csiph-web

Re: data validation when creating an object

Contents

#64033 — Re: data validation when creating an object

#64048

#64050

#64053

#64086

#64087

#64091

#64089