Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64033 > unrolled thread
| Started by | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| First post | 2014-01-16 12:16 +1100 |
| Last post | 2014-01-16 10:44 -0600 |
| Articles | 8 — 4 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: data validation when creating an object Ben Finney <ben+python@benfinney.id.au> - 2014-01-16 12:16 +1100
Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-15 23:05 -0500
Re: data validation when creating an object Ben Finney <ben+python@benfinney.id.au> - 2014-01-16 15:53 +1100
Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-16 00:05 -0500
Re: data validation when creating an object Robert Kern <robert.kern@gmail.com> - 2014-01-16 15:46 +0000
Re: data validation when creating an object Roy Smith <roy@panix.com> - 2014-01-16 08:18 -0800
Re: data validation when creating an object Robert Kern <robert.kern@gmail.com> - 2014-01-16 16:58 +0000
Re: data validation when creating an object Skip Montanaro <skip@pobox.com> - 2014-01-16 10:44 -0600
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2014-01-16 12:16 +1100 |
| Subject | Re: data validation when creating an object |
| Message-ID | <mailman.5555.1389834993.18130.python-list@python.org> |
Rita <rmorgan466@gmail.com> writes: > I would like to do some data validation when its going to a class. > > class Foo(object): > def __init__(self): > pass > > I know its frowned upon to do work in the __init__() method and only > declarations should be there. Who says it's frowned on to do work in the initialiser? Where are they saying it? That seems over-broad, I'd like to read the context of that advice. > So, should i create a function called validateData(self) inside foo? If you're going to create it, ‘validate_data’ would be a better name (because it's PEP 8 conformant). > I would call the object like this > > x=Foo() > x.validateData() You should also be surrounding the “=” operator with spaces (PEP 8 again) for readability. > Is this the preferred way? Is there a way I can run validateData() > automatically, maybe put it in __init__? It depends entirely on what is being done in those functions. But in general, we tend not to write our functions small enough or focussed enough. So general advice would be that, if you think the function is going to be too long and/or doing too much, you're probably right :-) -- \ “Nature hath given men one tongue but two ears, that we may | `\ hear from others twice as much as we speak.” —Epictetus, | _o__) _Fragments_ | Ben Finney
[toc] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-01-15 23:05 -0500 |
| Message-ID | <roy-BA09BB.23054615012014@news.panix.com> |
| In reply to | #64033 |
Rita <rmorgan466@gmail.com> writes: >> I know its frowned upon to do work in the __init__() method and only >> declarations should be there. In article <mailman.5555.1389834993.18130.python-list@python.org>, Ben Finney <ben+python@benfinney.id.au> wrote: > Who says it's frowned on to do work in the initialiser? Where are they > saying it? That seems over-broad, I'd like to read the context of that > advice. Weird, I was just having this conversation at work earlier this week. There are some people who advocate that C++ constructors should not do a lot of work and/or should be incapable of throwing exceptions. The pros and cons of that argument are largely C++ specific. Here's a Stack Overflow thread which covers most of the usual arguments on both sides: http://stackoverflow.com/questions/293967/how-much-work-should-be-done-in -a-constructor But, Python is not C++. I suspect the people who argue for __init__() not doing much are extrapolating a C++ pattern to other languages without fully understanding the reason why. That being said, I've been on a tear lately, trying to get our unit test suite to run faster. I came across one slow test which had an interesting twist. The class being tested had an __init__() method which read over 900,000 records from a database and took something like 5-10 seconds to run. Man, talk about heavy-weight constructors :-)
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2014-01-16 15:53 +1100 |
| Message-ID | <mailman.5567.1389848051.18130.python-list@python.org> |
| In reply to | #64048 |
Roy Smith <roy@panix.com> writes: > Ben Finney <ben+python@benfinney.id.au> wrote: > > > Who says it's frowned on to do work in the initialiser? Where are they > > saying it? That seems over-broad, I'd like to read the context of that > > advice. > > There are some people who advocate that C++ constructors should not do > a lot of work and/or should be incapable of throwing exceptions. The > pros and cons of that argument are largely C++ specific. […] > > But, Python is not C++. I suspect the people who argue for __init__() > not doing much are extrapolating a C++ pattern to other languages > without fully understanding the reason why. Even simpler: They are mistaken in what the constructor is named, in Python. Python classes have the constructor, ‘__new__’. I would agree with advice not to do anything but allocate the resources for a new instance in the constructor. Indeed, the constructor from ‘object’ does a good enough job that the vast majority of Python classes never need a custom constructor at all. (This is probably why many beginning programmers are confused about what the constructor is called: They've never seen a class with its own constructor!) Python instances have an initialiser, ‘__init__’. That function is for setting up the specific instance for later use. This is commonly over-ridden and many classes define a custom initialiser, which normally does some amount of work. I don't think ‘__init__’ is subject to the conventions of a constructor, because *‘__init__’ is not a constructor*. -- \ “Absurdity, n. A statement or belief manifestly inconsistent | `\ with one's own opinion.” —Ambrose Bierce, _The Devil's | _o__) Dictionary_, 1906 | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-01-16 00:05 -0500 |
| Message-ID | <roy-FED514.00050716012014@news.panix.com> |
| In reply to | #64050 |
In article <mailman.5567.1389848051.18130.python-list@python.org>, Ben Finney <ben+python@benfinney.id.au> wrote: > Roy Smith <roy@panix.com> writes: > > But, Python is not C++. I suspect the people who argue for __init__() > > not doing much are extrapolating a C++ pattern to other languages > > without fully understanding the reason why. > > Even simpler: They are mistaken in what the constructor is named, in > Python. > > Python classes have the constructor, ‘__new__’. I would agree with > advice not to do anything but allocate the resources for a new instance > in the constructor. I've always found this distinction to be somewhat silly. C++ constructors are also really just initializers. Before your constructor is called, something else (operator new, at least for objects in the heap) has already allocated memory for the object. It's the constructor's job to initialize the data. That's really very much the same distinction as between __new__() and __init__().
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2014-01-16 15:46 +0000 |
| Message-ID | <mailman.5590.1389887225.18130.python-list@python.org> |
| In reply to | #64048 |
On 2014-01-16 04:05, Roy Smith wrote: > Rita <rmorgan466@gmail.com> writes: >>> I know its frowned upon to do work in the __init__() method and only >>> declarations should be there. > > > In article <mailman.5555.1389834993.18130.python-list@python.org>, > Ben Finney <ben+python@benfinney.id.au> wrote: > >> Who says it's frowned on to do work in the initialiser? Where are they >> saying it? That seems over-broad, I'd like to read the context of that >> advice. > > Weird, I was just having this conversation at work earlier this week. > > There are some people who advocate that C++ constructors should not do a > lot of work and/or should be incapable of throwing exceptions. The pros > and cons of that argument are largely C++ specific. Here's a Stack > Overflow thread which covers most of the usual arguments on both sides: > > http://stackoverflow.com/questions/293967/how-much-work-should-be-done-in > -a-constructor > > But, Python is not C++. I suspect the people who argue for __init__() > not doing much are extrapolating a C++ pattern to other languages > without fully understanding the reason why. I'm one of those people who tends to argue this, but my limited experience with C++ does not inform my opinion one way or the other. I prefer to keep my __init__() methods as dumb as possible to retain the flexibility to construct my objects in different ways. Sure, it's convenient to, say, pass a filename and have the __init__() open() it for me. But then I'm stuck with only being able to create this object with a true, named file on disk. I can't create it with a StringIO for testing, or by opening a file and seeking to a specific spot where the relevant data starts, etc. I can keep the flexibility and convenience by keeping __init__() dumb and relegating various smarter and more convenient ways to instantiate the object to classmethods. Which isn't to say that "smart" or "heavy" __init__()s don't have their place for some kinds of objects. I just think that dumb __init__()s should be the default. That said, what the OP asks about, validating data in the __init__() is perfectly fine, IMO. My beef isn't so much with the raw *amount* of stuff done but how much you can code yourself into a corner by making limiting assumptions. So from one of the "do nothing in your __init__()" crowd, I say "well, I didn't really mean *nothing*...." > That being said, I've been on a tear lately, trying to get our unit test > suite to run faster. I came across one slow test which had an > interesting twist. The class being tested had an __init__() method > which read over 900,000 records from a database and took something like > 5-10 seconds to run. Man, talk about heavy-weight constructors :-) Indeed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-01-16 08:18 -0800 |
| Message-ID | <6a2d342e-78b6-45b5-a8b4-9c5767edc8b1@googlegroups.com> |
| In reply to | #64086 |
On Thursday, January 16, 2014 10:46:10 AM UTC-5, Robert Kern wrote:
> I prefer to keep my __init__() methods as dumb as possible to retain the
> flexibility to construct my objects in different ways. Sure, it's convenient to,
> say, pass a filename and have the __init__() open() it for me. But then I'm
> stuck with only being able to create this object with a true, named file on
> disk. I can't create it with a StringIO for testing, or by opening a file and
> seeking to a specific spot where the relevant data starts, etc. I can keep the
> flexibility and convenience by keeping __init__() dumb and relegating various
> smarter and more convenient ways to instantiate the object to classmethods.
There's two distinct things being discussed here.
The idea of passing a file-like object vs. a filename gives you flexibility, that's for sure. But, that's orthogonal to how much work should be done in the constructor. Consider this class:
class DataSlurper:
def __init__(self):
self.slurpee = None
def attach_slurpee(self, slurpee):
self.slurpee = slurpee
def slurp(self):
for line in self.slurpee:
# whatever
This exhibits the nice behavior you describe; you can pass it any iterable, not just a file, so you have a lot more flexibility. But, it's also exhibiting what many people call the "two-phase constructor" anti-pattern. When you construct an instance of this class, it's not usable until you call attach_slurpee(), so why not just do that in the constructor?
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2014-01-16 16:58 +0000 |
| Message-ID | <mailman.5592.1389891508.18130.python-list@python.org> |
| In reply to | #64087 |
On 2014-01-16 16:18, Roy Smith wrote:
> On Thursday, January 16, 2014 10:46:10 AM UTC-5, Robert Kern wrote:
>
>> I prefer to keep my __init__() methods as dumb as possible to retain the
>> flexibility to construct my objects in different ways. Sure, it's convenient to,
>> say, pass a filename and have the __init__() open() it for me. But then I'm
>> stuck with only being able to create this object with a true, named file on
>> disk. I can't create it with a StringIO for testing, or by opening a file and
>> seeking to a specific spot where the relevant data starts, etc. I can keep the
>> flexibility and convenience by keeping __init__() dumb and relegating various
>> smarter and more convenient ways to instantiate the object to classmethods.
>
> There's two distinct things being discussed here.
>
> The idea of passing a file-like object vs. a filename gives you flexibility, that's for sure. But, that's orthogonal to how much work should be done in the constructor. Consider this class:
Where the two get conflated is that both lead to advice that looks the same (or
at least can get interpreted the same by newbies who are trying to learn and
don't have the experience to pick out the subtleties): "do nothing in __init__".
That's why I am trying to clarify where this advice might be coming from and why
at least one version of it may be valid.
> class DataSlurper:
> def __init__(self):
> self.slurpee = None
>
> def attach_slurpee(self, slurpee):
> self.slurpee = slurpee
>
> def slurp(self):
> for line in self.slurpee:
> # whatever
>
> This exhibits the nice behavior you describe; you can pass it any iterable, not just a file, so you have a lot more flexibility. But, it's also exhibiting what many people call the "two-phase constructor" anti-pattern. When you construct an instance of this class, it's not usable until you call attach_slurpee(), so why not just do that in the constructor?
That's where my recommendation of classmethods come in. The result of __init__()
should always be usable. It's just that its arguments may not be as convenient
as you like because you pass in objects that are closer to the internal
representation than you normally want to deal with (e.g. file objects instead of
filenames). You make additional constructors (initializers, whatever) as
classmethods to restore convenience.
class DataSlurper:
def __init__(self, slurpee):
self.slurpee = slurpee
@classmethod
def fromfile(cls, filename):
slurpee = open(filename)
return cls(slurpee)
@classmethod
def fromurl(cls, url):
slurpee = urllib.urlopen(url)
return cls(slurpee)
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Skip Montanaro <skip@pobox.com> |
|---|---|
| Date | 2014-01-16 10:44 -0600 |
| Message-ID | <mailman.5591.1389890666.18130.python-list@python.org> |
| In reply to | #64048 |
I suspect when best to validate inputs depends on when they
come in, and what the cost is of having objects with invalid
state. If the input is something that is passed along when
the object is instantiated, you kind of have to validate in
__init__ or __new__, right?
Let's create a stupid example:
class Point(object):
def __init__(self, coordinates):
self.x, self.y, self.z = coordinates
That's kind of self-validating. If you pass something that
doesn't quack like a three-element sequence, the program
will crash. OTOH, Point("abc") will appear to work until you
expect x, y and z to act like numbers. If you can't tolerate
that, then a __new__ method allows you to validate arguments
before creating a new Point instance. You might also allow
one- or two-element tuples:
def __new__(cls, coordinates):
... convert to tuple, then ...
... verify that all elements are numbers, then ...
if len(coordinates) > 3:
raise ValueError("Expect 1-, 2-, or 3-element tuple")
if len(coordinates) < 2:
coordinates += (0.0,)
if len(coordinates) < 3:
coordinates += (0.0,)
return cls(coordinates)
Validating in __new__ will allow you to catch problems
sooner, and give you the option of returning some sort of
sentinel instead of just raising an exception, though that
is probably not generally good practice. This will catch
Point("abc") quickly, with a stack trace pointing to the
offending code.
Of course, you might need to validate any other inputs which
appear after instantiation:
def move(self, xdelta=0.0, ydelta=0.0, zdelta=0.0):
self.x += xdelta
self.y += ydelta
self.z += zdelta
Whether you need more feedback than an exception might give
you here is debatable, but there are likely plenty of
situations where you need to explicitly validate user input
before using it (think of accepting string data from the net
which you plan to feed to your SQL database...). Those sorts
of validation steps are beyond the scope of this thread, and
probably much better handled by platforms further up the
software stack (like Django or SQLAlchemy).
Skip
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web