Groups > comp.lang.python > #60995 > unrolled thread

A cautionary tale

Started by	"Frank Millman" <frank@chagford.com>
First post	2013-12-04 11:16 +0200
Last post	2013-12-05 10:57 +0200
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

  A cautionary tale "Frank Millman" <frank@chagford.com> - 2013-12-04 11:16 +0200
    Re: A cautionary tale Steven D'Aprano <steve@pearwood.info> - 2013-12-05 08:06 +0000
      Re: A cautionary tale "Frank Millman" <frank@chagford.com> - 2013-12-05 10:57 +0200

#60995 — A cautionary tale

From	"Frank Millman" <frank@chagford.com>
Date	2013-12-04 11:16 +0200
Subject	A cautionary tale
Message-ID	<mailman.3548.1386148611.18130.python-list@python.org>

Hi all

There is no question at the end of this, it is just an account of a couple 
of days in the life of a programmer (me). I just felt like sharing it. Feel 
free to ignore.

The business/accounting system I am writing involves a lot of reading data 
from a database, and if changed, writing it back again.

There are a number of data types involved - string, integer, decimal, 
boolean, date, datetime. I currently support PostgreSQL, MS SQL Server, and 
sqlite3. In all cases, they have a DB-API 2.0-compliant adaptor which 
handles the conversion to and from python objects transparently.

Over the last year or so I have added two new types. I added a JSON type, to 
handle 'lists' and 'dicts', and an XML type to handle more complex 
structures. In both cases they are stored in the database as strings, so I 
have to handle the conversions myself.

I don't allow direct access to the objects, as they can be affected by 
various business rules, so I use getters and setters. For the new types, I 
used the getter to convert from the string to the underlying object, and the 
setter to convert it back to a string.

Then a couple of days ago I decided that this was not the correct place to 
do it - it should be done when reading from and writing to the database. 
That way the data is always represented by the underlying object, which can 
be passed around without worrying about conversions.

It was a bit of effort, as I had to add extra getters and setters to handle 
the transfer between the database and the program, and then over-ride them 
in the case of the new data types to provide the required functionality. But 
after a few hours of hunting down all the places that required changes, 
testing, fixing errors, etc, it seemed to be working fine, so I thought I 
could carry on with the meat of my program.

Then I noticed that certain changes were not being written back to the 
database. After some investigation, I found the error in a part of my 
program that I have not had to look at for ages. When reading data in from 
the database, I preserve a copy of the original value. When saving, I 
compare that to the current value when deciding which columns need updating. 
I do this in the obvious way -

    on reading -
        orig_value = value

    on saving -
       if value != orig_value:
            this one needs updating

Have you spotted the deliberate mistake yet? In the case of a JSON list, 
orig_value and value point to the same, mutable, list. So when I compare 
value with orig_value, they are always the same, whether changes have been 
made or not!

The obvious answer is to store a copy of the list. It was not so obvious 
where to make the change, as there were other implications. Eventually I 
decided to over-ride the 'getter' for the JSON type, and return copy(value) 
instead of value. That way if it is changed and then put back using the 
'setter', the two objects are no longer equal. I have made that change, done 
some more testing, and for now it seems ok.

So have the last couple of days been a waste of time? I don't think so. Is 
the program a bit cleaner and conceptually sounder? I hope so.

Why am I telling you all this? No particular reason, just thought some of 
you might find it interesting.

Frank Millman

[toc] | [next] | [standalone]

#61081

From	Steven D'Aprano <steve@pearwood.info>
Date	2013-12-05 08:06 +0000
Message-ID	<52a033f5$0$11112$c3e8da3@news.astraweb.com>
In reply to	#60995

On Wed, 04 Dec 2013 11:16:40 +0200, Frank Millman wrote:

[...]
> Then I noticed that certain changes were not being written back to the
> database. After some investigation, I found the error in a part of my
> program that I have not had to look at for ages. When reading data in
> from the database, I preserve a copy of the original value. When saving,
> I compare that to the current value when deciding which columns need
> updating. I do this in the obvious way -
> 
>     on reading -
>         orig_value = value

As you know, that is not the obvious way to create a copy. That binds a 
new name to the same value. To create a copy, you need:

orig_value = copy.copy(value)

or for lists:

orig_value = value[:]

I think we've all made the same mistake you have, it's an easy mistake to 
make, but for an experienced Python code it should also be an easy 
mistake to identify. Mind you, tracking it down might take a while...

>     on saving -
>        if value != orig_value:
>             this one needs updating
> 
> Have you spotted the deliberate mistake yet? In the case of a JSON list,
> orig_value and value point to the same, mutable, list. So when I compare
> value with orig_value, they are always the same, whether changes have
> been made or not!
> 
> The obvious answer is to store a copy of the list. It was not so obvious
> where to make the change, as there were other implications. Eventually I
> decided to over-ride the 'getter' for the JSON type, and return
> copy(value) instead of value. That way if it is changed and then put
> back using the 'setter', the two objects are no longer equal. I have
> made that change, done some more testing, and for now it seems ok.

I wouldn't do it that way. To my mind, the most obvious way of handling 
this is to add a "changed" flag to the object, and ensure that any 
modifications set the flag.

Of course, that may end up being more work than the way you've done it, 
but it will save making potentially many copies of objects which don't 
need to be copied.

> So have the last couple of days been a waste of time? I don't think so.
> Is the program a bit cleaner and conceptually sounder? I hope so.
> 
> Why am I telling you all this? No particular reason, just thought some
> of you might find it interesting.

Thank you for sharing your experiences with us!

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#61082

From	"Frank Millman" <frank@chagford.com>
Date	2013-12-05 10:57 +0200
Message-ID	<mailman.3605.1386233852.18130.python-list@python.org>
In reply to	#61081

"Steven D'Aprano" <steve@pearwood.info> wrote in message 
news:52a033f5$0$11112$c3e8da3@news.astraweb.com...
> On Wed, 04 Dec 2013 11:16:40 +0200, Frank Millman wrote:
>
> [...]
>> Then I noticed that certain changes were not being written back to the
>> database. After some investigation, I found the error in a part of my
>> program that I have not had to look at for ages. When reading data in
>> from the database, I preserve a copy of the original value. When saving,
>> I compare that to the current value when deciding which columns need
>> updating. I do this in the obvious way -
>>
>>     on reading -
>>         orig_value = value
>
> As you know, that is not the obvious way to create a copy. That binds a
> new name to the same value. To create a copy, you need:
>
> orig_value = copy.copy(value)
>
> or for lists:
>
> orig_value = value[:]
>

True enough, but before I added my new data types, all the types I used were 
immutable, so I got away with it.

[...]

>
> I wouldn't do it that way. To my mind, the most obvious way of handling
> this is to add a "changed" flag to the object, and ensure that any
> modifications set the flag.
>
> Of course, that may end up being more work than the way you've done it,
> but it will save making potentially many copies of objects which don't
> need to be copied.
>
>

There are two (no, make that three) reasons why I need to preserve the 
original value -

1. A user can make changes to an object, and then before saving, select 
'cancel'. I then restore all values to their originals.

2.  I keep an 'audit trail' of changes to data rows, showing before and 
after values.

3. I use a form of 'optimistic concurrency control' to guard against two 
users updating the same row at the same time, resulting in one set of 
changes being overwritten. The classic form adds a 'timestamp' column to the 
table, which is timestamped on every update. Prior to executing an update, 
the timestamp is re-read from the database, and if not the same as the 
original, the update is aborted. My variation is to compare only those 
columns which have been changed, so I compare the re-read values with the 
orig_values. The benefit is that an organisation could have one user 
maintaining, say, credit information, while another may be trying to change 
the same person's telephone number. With my approach, both will succeed even 
if executed simultaneously [1].

Frank

[1] Not literally simultaneously. All that is required is that one user 
selects the row, and then before updating it, someone else selects the same 
row.

[toc] | [prev] | [standalone]

csiph-web

A cautionary tale

Contents

#60995 — A cautionary tale

#61081

#61082