Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #60995 > unrolled thread
| Started by | "Frank Millman" <frank@chagford.com> |
|---|---|
| First post | 2013-12-04 11:16 +0200 |
| Last post | 2013-12-05 10:57 +0200 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.lang.python
A cautionary tale "Frank Millman" <frank@chagford.com> - 2013-12-04 11:16 +0200
Re: A cautionary tale Steven D'Aprano <steve@pearwood.info> - 2013-12-05 08:06 +0000
Re: A cautionary tale "Frank Millman" <frank@chagford.com> - 2013-12-05 10:57 +0200
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2013-12-04 11:16 +0200 |
| Subject | A cautionary tale |
| Message-ID | <mailman.3548.1386148611.18130.python-list@python.org> |
Hi all
There is no question at the end of this, it is just an account of a couple
of days in the life of a programmer (me). I just felt like sharing it. Feel
free to ignore.
The business/accounting system I am writing involves a lot of reading data
from a database, and if changed, writing it back again.
There are a number of data types involved - string, integer, decimal,
boolean, date, datetime. I currently support PostgreSQL, MS SQL Server, and
sqlite3. In all cases, they have a DB-API 2.0-compliant adaptor which
handles the conversion to and from python objects transparently.
Over the last year or so I have added two new types. I added a JSON type, to
handle 'lists' and 'dicts', and an XML type to handle more complex
structures. In both cases they are stored in the database as strings, so I
have to handle the conversions myself.
I don't allow direct access to the objects, as they can be affected by
various business rules, so I use getters and setters. For the new types, I
used the getter to convert from the string to the underlying object, and the
setter to convert it back to a string.
Then a couple of days ago I decided that this was not the correct place to
do it - it should be done when reading from and writing to the database.
That way the data is always represented by the underlying object, which can
be passed around without worrying about conversions.
It was a bit of effort, as I had to add extra getters and setters to handle
the transfer between the database and the program, and then over-ride them
in the case of the new data types to provide the required functionality. But
after a few hours of hunting down all the places that required changes,
testing, fixing errors, etc, it seemed to be working fine, so I thought I
could carry on with the meat of my program.
Then I noticed that certain changes were not being written back to the
database. After some investigation, I found the error in a part of my
program that I have not had to look at for ages. When reading data in from
the database, I preserve a copy of the original value. When saving, I
compare that to the current value when deciding which columns need updating.
I do this in the obvious way -
on reading -
orig_value = value
on saving -
if value != orig_value:
this one needs updating
Have you spotted the deliberate mistake yet? In the case of a JSON list,
orig_value and value point to the same, mutable, list. So when I compare
value with orig_value, they are always the same, whether changes have been
made or not!
The obvious answer is to store a copy of the list. It was not so obvious
where to make the change, as there were other implications. Eventually I
decided to over-ride the 'getter' for the JSON type, and return copy(value)
instead of value. That way if it is changed and then put back using the
'setter', the two objects are no longer equal. I have made that change, done
some more testing, and for now it seems ok.
So have the last couple of days been a waste of time? I don't think so. Is
the program a bit cleaner and conceptually sounder? I hope so.
Why am I telling you all this? No particular reason, just thought some of
you might find it interesting.
Frank Millman
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-12-05 08:06 +0000 |
| Message-ID | <52a033f5$0$11112$c3e8da3@news.astraweb.com> |
| In reply to | #60995 |
On Wed, 04 Dec 2013 11:16:40 +0200, Frank Millman wrote: [...] > Then I noticed that certain changes were not being written back to the > database. After some investigation, I found the error in a part of my > program that I have not had to look at for ages. When reading data in > from the database, I preserve a copy of the original value. When saving, > I compare that to the current value when deciding which columns need > updating. I do this in the obvious way - > > on reading - > orig_value = value As you know, that is not the obvious way to create a copy. That binds a new name to the same value. To create a copy, you need: orig_value = copy.copy(value) or for lists: orig_value = value[:] I think we've all made the same mistake you have, it's an easy mistake to make, but for an experienced Python code it should also be an easy mistake to identify. Mind you, tracking it down might take a while... > on saving - > if value != orig_value: > this one needs updating > > Have you spotted the deliberate mistake yet? In the case of a JSON list, > orig_value and value point to the same, mutable, list. So when I compare > value with orig_value, they are always the same, whether changes have > been made or not! > > The obvious answer is to store a copy of the list. It was not so obvious > where to make the change, as there were other implications. Eventually I > decided to over-ride the 'getter' for the JSON type, and return > copy(value) instead of value. That way if it is changed and then put > back using the 'setter', the two objects are no longer equal. I have > made that change, done some more testing, and for now it seems ok. I wouldn't do it that way. To my mind, the most obvious way of handling this is to add a "changed" flag to the object, and ensure that any modifications set the flag. Of course, that may end up being more work than the way you've done it, but it will save making potentially many copies of objects which don't need to be copied. > So have the last couple of days been a waste of time? I don't think so. > Is the program a bit cleaner and conceptually sounder? I hope so. > > Why am I telling you all this? No particular reason, just thought some > of you might find it interesting. Thank you for sharing your experiences with us! -- Steven
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2013-12-05 10:57 +0200 |
| Message-ID | <mailman.3605.1386233852.18130.python-list@python.org> |
| In reply to | #61081 |
"Steven D'Aprano" <steve@pearwood.info> wrote in message news:52a033f5$0$11112$c3e8da3@news.astraweb.com... > On Wed, 04 Dec 2013 11:16:40 +0200, Frank Millman wrote: > > [...] >> Then I noticed that certain changes were not being written back to the >> database. After some investigation, I found the error in a part of my >> program that I have not had to look at for ages. When reading data in >> from the database, I preserve a copy of the original value. When saving, >> I compare that to the current value when deciding which columns need >> updating. I do this in the obvious way - >> >> on reading - >> orig_value = value > > As you know, that is not the obvious way to create a copy. That binds a > new name to the same value. To create a copy, you need: > > orig_value = copy.copy(value) > > or for lists: > > orig_value = value[:] > True enough, but before I added my new data types, all the types I used were immutable, so I got away with it. [...] > > I wouldn't do it that way. To my mind, the most obvious way of handling > this is to add a "changed" flag to the object, and ensure that any > modifications set the flag. > > Of course, that may end up being more work than the way you've done it, > but it will save making potentially many copies of objects which don't > need to be copied. > > There are two (no, make that three) reasons why I need to preserve the original value - 1. A user can make changes to an object, and then before saving, select 'cancel'. I then restore all values to their originals. 2. I keep an 'audit trail' of changes to data rows, showing before and after values. 3. I use a form of 'optimistic concurrency control' to guard against two users updating the same row at the same time, resulting in one set of changes being overwritten. The classic form adds a 'timestamp' column to the table, which is timestamped on every update. Prior to executing an update, the timestamp is re-read from the database, and if not the same as the original, the update is aborted. My variation is to compare only those columns which have been changed, so I compare the re-read values with the orig_values. The benefit is that an organisation could have one user maintaining, say, credit information, while another may be trying to change the same person's telephone number. With my approach, both will succeed even if executed simultaneously [1]. Frank [1] Not literally simultaneously. All that is required is that one user selects the row, and then before updating it, someone else selects the same row.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web