Groups > comp.databases.postgresql > #133 > unrolled thread

migrating oracle to postgres

Started by	Username <user@domain.tld>
First post	2011-06-20 19:28 +0200
Last post	2011-06-28 22:28 +0200
Articles	20 on this page of 60 — 12 participants

Back to article view | Back to comp.databases.postgresql

  migrating oracle to postgres Username <user@domain.tld> - 2011-06-20 19:28 +0200
    Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-20 20:08 +0000
      Re: migrating oracle to postgres Username <user@domain.tld> - 2011-06-20 23:45 +0200
        Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-20 22:37 +0000
        Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-21 02:01 +0000
    Re: migrating oracle to postgres "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-21 09:18 +0200
      Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-21 13:38 +0000
        Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-22 12:30 +0200
          Re: Is PostgreSQL good? Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-22 13:15 +0200
          Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-22 12:02 +0000
            Re: Is PostgreSQL good? "M. Strobel" <sorry_no_mail_here@nowhere.dee> - 2011-06-22 21:52 +0200
              Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-22 22:43 +0000
            Re: Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-24 12:29 +0200
              Re: Is PostgreSQL good? Hans Castorp <REWYRLXHEGHO@spammotel.com> - 2011-06-24 12:57 +0200
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-24 14:53 +0200
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-24 16:32 +0000
              Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-24 13:53 +0000
                Re: Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-27 11:17 +0200
                  Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-28 06:11 +0000
                    Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-28 10:34 +0200
                      Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-28 13:30 +0000
                        Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-29 09:50 +0200
                          Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-29 21:38 +0000
                            Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-30 09:14 +0200
                              Re: Is PostgreSQL good? Mladen Gogala <mgogala@no.address.invalid> - 2011-06-30 13:22 +0000
              Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 07:05 -0700
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 15:41 +0000
                  Re: Is PostgreSQL good? Richard Kettlewell <rjk@greenend.org.uk> - 2011-06-30 16:49 +0100
                    Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 19:24 +0000
                      Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 13:21 -0700
                        Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 22:42 +0000
                          Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-01 11:57 -0700
                            Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 22:58 +0000
                      Re: Is PostgreSQL good? Richard Kettlewell <rjk@greenend.org.uk> - 2011-07-01 14:07 +0100
                        Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 13:29 +0000
                  Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 13:17 -0700
                    Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-30 23:53 +0000
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-01 10:13 +0200
                  Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 13:38 +0000
                    Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-01 16:58 +0200
                      Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-07-01 15:33 +0000
                  Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-01 11:39 -0700
                    Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-04 16:39 +0200
                      Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-04 12:11 -0700
                        Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-05 08:58 +0200
                          Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-05 11:29 -0700
                            Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-06 09:00 +0200
                              Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-06 01:00 -0700
                                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-07 09:05 +0200
                                  Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-07 00:34 -0700
                    Re: Is PostgreSQL good? Matthew Woodcraft <mattheww@chiark.greenend.org.uk> - 2011-07-04 20:12 +0100
                      Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-04 12:24 -0700
        Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-23 17:48 +0200
          Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-23 16:42 +0000
            Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-23 18:57 +0000
            Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-26 17:35 +0200
              Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-26 19:08 +0000
            Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-27 00:32 -0700
              Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-28 16:00 +0000
                Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-28 22:28 +0200

Page 3 of 3 — ← Prev page 1 2 [3]

#179 — Re: Is PostgreSQL good?

From	Mladen Gogala <no@email.here.invalid>
Date	2011-07-01 15:33 +0000
Subject	Re: Is PostgreSQL good?
Message-ID	<pan.2011.07.01.15.33.42@email.here.invalid>
In reply to	#178

On Fri, 01 Jul 2011 16:58:09 +0200, Laurenz Albe wrote:


> Thanks for the correction. I have not worked with DB2 since 2001.

Unfortunately, the company that work for has turned its last AIX box off 
in January :). I will not be able to work with it again, for a 
foreseeable future.



-- 
http://mgogala.byethost5.com

[toc] | [prev] | [next] | [standalone]

#180 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-01 11:39 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iul48k$9v1$1@speranza.aioe.org>
In reply to	#174

Hi Laurenz,

On 7/1/2011 1:13 AM, Laurenz Albe wrote:
> Don Y wrote:
>> [apologies for not having the technical depth to understand these
>> issues better -- I just use PostgreSQL as a "means to an end"...
>> I could use any similar "means" but this seems to work well enough
>> for me]
>
> No worries, but don't take it as rudeness if I give you links for
> some of these question instead of answering everything in detail.

Understood.

>> Can someone please, *gently*, explain *quantitatively* the differences
>> that these issues make in performance?
>
> Quantifying performance is only possible in a narrowly defined setting
> (in my opinion).

Of course.  I was trying to get an idea as to what was "at
stake".

E.g., a peephole optimizer (in a compiled language) makes very
small improvements -- changing, for example:
    store foo
    load foo
into the equivalent:
    store foo
(since foo is already loaded, no need to REload it).  The
savings are typically insignificant unless in a very *tight*
loop where the cost of that *single* saved instruction represents
a large portion of the cost of a single iteration.

OTOH, rewriting:
     if ((count_all_people_in_the_world() && (today != weekday)) ...
as:
     if ((today != weekday) && (count_all_people_in_the_world()) ...
provides *big* savings as count_all_people_in_the_world() isn't
done *most* days (because today is more often a weekday than a
NONweekday!).

>> Are hints on a par with the 'register' keyword in C?  I.e.,
>> something you *hope* gives the compiler an insight into your
>> algorithm -- but, that the compiler is free to *ignore*?
>> Are there consequences to using a "hint" (e.g., using 'register'
>> places constraints on what you can do)?
>
> Roughly yes. Except that (at least in Oracle) a hint will only
> be ignored if it is impossible to use (you cannot force an index
> that cannot be used for this query).

So, if the hint is "a bad idea", the DBMS won't ignore it, even
if it "knows better"?  I.e., the DBA can *worsen* performance
by hinting incorrectly?

>> Is the "hint" concept essentially a workaround to compensate for
>> capabilities *missing* in the optimizer?  I.e., as C compilers
>> get smarter, they deliberately *choose* to ignore the 'register'
>> keyword because they actually have *more* insight than the
>> programmer!
>
> Yes, it is a workaround for optimizer deficiencies.
> The problem here is that optimizing SQL statements is quite difficult
> because SQL is a declarative language, i.e. you tell the system what
> you want to get and not how to get it. The optimizer will have to
> figure out an execution path.

Understood.  The "hint" plays the same intended role as "register"
in C -- "I, the human, know more about this than you, the computer,
so take my advice and do *this*...".  But, in C, the compiler
can effectively say, "Silly Rabbit, no you *don't*!"

>> Are there ways (without using a "hint") to coerce the database
>> to perform in the way your hint would *hope* to have it behave?
>> (e.g., can you rewrite the SQL to cause the behavior you seek?)
>
> Sometimes, yes. Often a simplified statement will be easier
> to optimize. If you know the optimizer well, you can also trick
> it into not considering certain optimizations.
> The optimizer uses "statistics" which are data about the number
> and distribution of data in table columns. A lot of times the
> optimizer will perform better if you collect more data.
> You can also set database parameters that influence the optimizer.

Are those parameters effectively "hints of a different color"?
I.e., hints regarding the makeup of the *data* themselves
instead of a particular *query*?

>> What sort of *quantitative* difference can this make (example?)
>> that can't, otherwise, be attained?
>
> The effect of choosing a better plan can improve things *a lot*.

Oh.  Are these situations likely to creep in often and *not*
be obvious to the DBA ahead of time?  I.e., why would he
know to "hint" at them yet the optimizer not be able to
foresee the same issue(s)?

> If you fetch 10 million rows, only to discard them later because
> you join with an empty result set, that's *much* worse than not
> fetching anything at all.

Again, why would the DBA know this is likely to be the case
but the optimizer (planner) *doesn't*?

>>> About vacuum:
>>> =============
>>>
>>> Of course vacuum is a pain, but every database system has to pay a similar
>>> price somewhere. Either you reduce concurrency (DB2), or (if you use some kind
>>> of MVCC) you can move dead rows somewhere else (Oracle) or leave them where
>>> they are and clean up later.
>>
>> Is this just the difference between automatic garbage collection
>> and "manual" garbage collection (again, relying on parallels to
>> other "languages")?
>
> No. DB2 does not create garbage, because there is only one version
> of each row at any given time. To get that, a writer has to wait
> until a reader is done (i.e., the reader will take a "read lock"
> on the row he or she is reading).

Understood.  There is, therefore, no need to vacuum.

> Oracle and PostgreSQL both create garbage, because they both keep
> old versions of rows around to avoid readers blocking writers.
> Oracle collects the garbage automatically *while* it executes
> SQL statements.
>
> PostgreSQL collects garbage at some later time, either manually
> by an administrator running the VACUUM command, or automatically
> if the autovacuum daemon is turned on.

That was what I *thought* (intuitively) the role of VACUUM was.
So, *I*, knowing what I have done and am likely to do *next*,
can decide (if auto is off) whether it behooves me to vacuum
*now* or "wait until after this next -- which will be my
LAST -- query, at which time the DB will be idle"

>> Does the parallel maintain throughout operation or does it fall
>> apart at the extremes?  E.g., if garbage collection must be
>> manually initiated (in a language), then failing to do so
>> can eventually cause you to run out of heap, etc. (which
>> causes the application to "fail").  Is the same true when
>> it comes to vacuuming -- or, does performance just suffer?
>> (i.e., will you ever "fill the disk" with dead rows if you
>> fail to initiate a VACUUM operation?)
>
> All of this is only relevant if you have autovacuum turned off
> (autovacuum is the recommended way these days).

Understood.

> Without autovacuum and if VACUUM is never run, two things will
> happen:
> 1) Your tables and indexes will grow on every insert and update,
>     and deletes and updates will create garbage that cannot be
>     reused. That can fill the disk and drastically increase the
>     duration of table and index scans.
> 2) You will at some point run out of transaction IDs, and the
>     system will shutdown to prevent data loss.

So, it's just like turning off the GC and waiting for the
free store to be exhausted.  Sooner or later, there's no
more "memory" available and a memory allocation request
fails -- causing the application to fail at that point
(though the "recovery" routine could simply manually
initiate GC at that point and rerun the request).

> See
> http://www.postgresql.org/docs/current/static/routine-vacuuming.html
> for details.
>
>>> ROWID:
>>> ======
>>>
>>> You make the point that because of VACUUM there is nothing compared to
>>> Oracle's ROWID. But do you really need that? Shouldn't a row be referenced
>>> by its primary key?
>>
>> The last question seems to be my understanding of how things should
>> work. (?)  (actually, one of the harder ideas for me to get accustomed
>> to in this field  :<  )
>
> Right. It was a rethorical question.
>
> To digress, I always find a library a good analogy for a database.
> You have a lot of books there, each has a title and an author.
>
> A primary key is a) an index (like an index in a library) and
> b) a rule that says that there may only be one book for every
> possible index entry.
>
> Without a primary key, one author could have more than one book with
> the same title, and you won't be able to tell them apart (even though
> the books themselves may be different).

Understood.  I tend to create an "ID" field in each table to
act as a primary key (because there often is no other way
to differentiate entries/rows -- and, because this maps
nicely to how I think of "pointers" in my code :> )

>> How does a ROWID differ from using an OID?  (which I presume is
>> not a good thing)
>
> The ROWID is the physical location of the row, basically the offset
> in the file. You can sort of just "lseek(3)" there and get it. This
> is the fastest way to access a row.
>
> An OID is a number from a system counter that *can* also be used
> as a primary key. To some extent, it's just a more old-fashioned
> way of what can be done better with a sequence these days.
> If used as a primary key, it is no faster than other primary keys.

Ah!  Excellent!  That makes the performance aspect very clear.
The OID can be changed, arbitrarily, while changing the ROWID
requires "physically" moving the record.

> See the final paragraphs of
> http://www.postgresql.org/docs/current/static/ddl-system-columns.html
> for details.
>
>>> Fixed size of WAL segments:
>>> ===========================
>>>
>>> Maybe I am dense, but what is the problem there?
>>
>> <grin>   I don't even know what they *are*!  ;-)
>
> WAL stands for "write ahead log" and is a transaction log.
> It is a way to provide durability for committed transactions in case
> of a system failure.

OK.  So, it's just another "missing feature" -- like buying a car
without air conditioning (not necessary if you live in a cold
climate)

> See
> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
> for details.

Thanks for taking the time (and pointers) to shed some light
on this.  Eventually, the WAL might be significant for me but
I doubt hints would ever be as, at a minimum, they would require
me to know a lot more about how the planner worked and how to
convince it to "do something else".

--don

[toc] | [prev] | [next] | [standalone]

#183 — Re: Is PostgreSQL good?

From	"Laurenz Albe" <invite@spam.to.invalid>
Date	2011-07-04 16:39 +0200
Subject	Re: Is PostgreSQL good?
Message-ID	<1309790420.3138@proxy.dienste.wien.at>
In reply to	#180

Don Y wrote:
>>> Are hints on a par with the 'register' keyword in C?  I.e.,
>>> something you *hope* gives the compiler an insight into your
>>> algorithm -- but, that the compiler is free to *ignore*?
>>> Are there consequences to using a "hint" (e.g., using 'register'
>>> places constraints on what you can do)?

>> Roughly yes. Except that (at least in Oracle) a hint will only
>> be ignored if it is impossible to use (you cannot force an index
>> that cannot be used for this query).

> So, if the hint is "a bad idea", the DBMS won't ignore it, even
> if it "knows better"?  I.e., the DBA can *worsen* performance
> by hinting incorrectly?

Yes.

>>> Are there ways (without using a "hint") to coerce the database
>>> to perform in the way your hint would *hope* to have it behave?
>>> (e.g., can you rewrite the SQL to cause the behavior you seek?)

>> Sometimes, yes. Often a simplified statement will be easier
>> to optimize. If you know the optimizer well, you can also trick
>> it into not considering certain optimizations.
>> The optimizer uses "statistics" which are data about the number
>> and distribution of data in table columns. A lot of times the
>> optimizer will perform better if you collect more data.
>> You can also set database parameters that influence the optimizer.

> Are those parameters effectively "hints of a different color"?
> I.e., hints regarding the makeup of the *data* themselves
> instead of a particular *query*?

No, they are a coarser version of hints.
They tell the database things like "do not use a nested loop join",
while a hint would say "use a merge join for this specific join
operation".
Or "it's really a good idea to use an index even if you think it
is not optimal" compared to "use this index".

>> The effect of choosing a better plan can improve things *a lot*.

> Oh.  Are these situations likely to creep in often and *not*
> be obvious to the DBA ahead of time?  I.e., why would he
> know to "hint" at them yet the optimizer not be able to
> foresee the same issue(s)?

These situations creep up and are not obvious.
Whether they creep up often or not depends on the quality of the
optimizer and the complexity of your queries and data.

Usually hints and other tuning facilities are used after a problem
has been discovered.

They are hard to discover in advance, because the problems usually
only manifest with realistict amounts of data.

>> If you fetch 10 million rows, only to discard them later because
>> you join with an empty result set, that's *much* worse than not
>> fetching anything at all.

> Again, why would the DBA know this is likely to be the case
> but the optimizer (planner) *doesn't*?

First, the planner is not perfect.

Moreover, it is not always possible to know these things in advance.
In the described simple case, the planner might go wrong because
he does not know that the second result set will be empty before he
actually tries it.
The DBA (or more likely the programmer), who has knowledge about
the data that is not evident from the database layout (like "no male
can ever have an abortion") can sometimes know more than the planner.

> That was what I *thought* (intuitively) the role of VACUUM was.
> So, *I*, knowing what I have done and am likely to do *next*,
> can decide (if auto is off) whether it behooves me to vacuum
> *now* or "wait until after this next -- which will be my
> LAST -- query, at which time the DB will be idle"

Not quite, because routine vacuuming is not done by the end user
who executes statements, but by an administrator. The end user will
not have the necessary permissions to vacuum all tables in the database.
With non-automatic vacuum, the DBA can for example schedule a daily
VACUUM of all tables at 11 p.m. because he or she knows that nothing
much is going on at that time.

>> WAL stands for "write ahead log" and is a transaction log.
>> It is a way to provide durability for committed transactions in case
>> of a system failure.

> OK.  So, it's just another "missing feature" -- like buying a car
> without air conditioning (not necessary if you live in a cold
> climate)

[...]

> Thanks for taking the time (and pointers) to shed some light
> on this.  Eventually, the WAL might be significant for me but
> I doubt hints would ever be as, at a minimum, they would require
> me to know a lot more about how the planner worked and how to
> convince it to "do something else".

WAL is unimportant for you only then if data loss is not a
problem for you, and yesterday's backup is good enough.

Yours,
Laurenz Albe

[toc] | [prev] | [next] | [standalone]

#184 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-04 12:11 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iut38g$9da$1@speranza.aioe.org>
In reply to	#183

Hi Laurenz,

[snips throughout]

On 7/4/2011 7:39 AM, Laurenz Albe wrote:

>>> Roughly yes. Except that (at least in Oracle) a hint will only
>>> be ignored if it is impossible to use (you cannot force an index
>>> that cannot be used for this query).
>
>> So, if the hint is "a bad idea", the DBMS won't ignore it, even
>> if it "knows better"?  I.e., the DBA can *worsen* performance
>> by hinting incorrectly?
>
> Yes.

Ah!  *But*, it can only "worsen (or improve) performance" -- not
affect the "correctness" of the result?

>>> The effect of choosing a better plan can improve things *a lot*.
>
>> Oh.  Are these situations likely to creep in often and *not*
>> be obvious to the DBA ahead of time?  I.e., why would he
>> know to "hint" at them yet the optimizer not be able to
>> foresee the same issue(s)?
>
> These situations creep up and are not obvious.
> Whether they creep up often or not depends on the quality of the
> optimizer and the complexity of your queries and data.

So, you stumble over them when a particular query (etc.)
unexpectedly proves to be more expensive than you had
"expected"?

> Usually hints and other tuning facilities are used after a problem
> has been discovered.
>
> They are hard to discover in advance, because the problems usually
> only manifest with realistict amounts of data.

And, presumably, hints only benefit *specific* dataset-query
combinations.  I.e., change the query and the hint's value may
change?

>>> If you fetch 10 million rows, only to discard them later because
>>> you join with an empty result set, that's *much* worse than not
>>> fetching anything at all.
>
>> Again, why would the DBA know this is likely to be the case
>> but the optimizer (planner) *doesn't*?
>
> First, the planner is not perfect.

Understood.  Neither is the programmer/DBA!  :>

> Moreover, it is not always possible to know these things in advance.
> In the described simple case, the planner might go wrong because
> he does not know that the second result set will be empty before he
> actually tries it.
> The DBA (or more likely the programmer), who has knowledge about
> the data that is not evident from the database layout (like "no male
> can ever have an abortion") can sometimes know more than the planner.

OK.  That would be very application-specific (and dataset-specific)
knowledge, though. (?)

>> That was what I *thought* (intuitively) the role of VACUUM was.
>> So, *I*, knowing what I have done and am likely to do *next*,
>> can decide (if auto is off) whether it behooves me to vacuum
>> *now* or "wait until after this next -- which will be my
>> LAST -- query, at which time the DB will be idle"
>
> Not quite, because routine vacuuming is not done by the end user
> who executes statements, but by an administrator. The end user will
> not have the necessary permissions to vacuum all tables in the database.

Yes, understood.  I'm thinking about *my* applications where there
is an agent interposed between the end user and the DBMS acting
as if the administrator.  So, it can run some commands on behalf of
the user and *then* decide to VACUUM (having inherent knowledge
of the consequences of those commands).

> With non-automatic vacuum, the DBA can for example schedule a daily
> VACUUM of all tables at 11 p.m. because he or she knows that nothing
> much is going on at that time.

That assumes you have sufficient free store to let a day's worth
of "garbage" accumulate.

>> Thanks for taking the time (and pointers) to shed some light
>> on this.  Eventually, the WAL might be significant for me but
>> I doubt hints would ever be as, at a minimum, they would require
>> me to know a lot more about how the planner worked and how to
>> convince it to "do something else".
>
> WAL is unimportant for you only then if data loss is not a
> problem for you, and yesterday's backup is good enough.

In my case, the opposite is true -- the application needs to
avoid "stumbling", if at all possible.  Going back to
yesterday's dataset would be very painful.

Thanks for the insights!
--don

[toc] | [prev] | [next] | [standalone]

#187 — Re: Is PostgreSQL good?

From	"Laurenz Albe" <invite@spam.to.invalid>
Date	2011-07-05 08:58 +0200
Subject	Re: Is PostgreSQL good?
Message-ID	<1309849143.243811@proxy.dienste.wien.at>
In reply to	#184

Don Y wrote:
[about query hints]

> Ah!  *But*, it can only "worsen (or improve) performance" -- not
> affect the "correctness" of the result?

Correct.

[about bad execution plans]

> So, you stumble over them when a particular query (etc.)
> unexpectedly proves to be more expensive than you had
> "expected"?

Usually, that's how it is. Unless you Do It Right and test with
a realistic set of data beforehand.

> And, presumably, hints only benefit *specific* dataset-query
> combinations.  I.e., change the query and the hint's value may
> change?

Yes. A query hint is part of the query and refers to certain
parts of it.

>> The DBA (or more likely the programmer), who has knowledge about
>> the data that is not evident from the database layout (like "no male
>> can ever have an abortion") can sometimes know more than the planner.

> OK.  That would be very application-specific (and dataset-specific)
> knowledge, though. (?)

Right. That's what I am talking about.

>> Not quite, because routine vacuuming is not done by the end user
>> who executes statements, but by an administrator. The end user will
>> not have the necessary permissions to vacuum all tables in the database.

> Yes, understood.  I'm thinking about *my* applications where there
> is an agent interposed between the end user and the DBMS acting
> as if the administrator.  So, it can run some commands on behalf of
> the user and *then* decide to VACUUM (having inherent knowledge
> of the consequences of those commands).

That sounds like a really bad idea.

If the database user that performs SQL on behalf of the end user
has superuser privileges, that constitutes an unnecessary risk.
Security holes or bugs in your software can cause much more damage
that way.

>> With non-automatic vacuum, the DBA can for example schedule a daily
>> VACUUM of all tables at 11 p.m. because he or she knows that nothing
>> much is going on at that time.

> That assumes you have sufficient free store to let a day's worth
> of "garbage" accumulate.

Of course.

>> WAL is unimportant for you only then if data loss is not a
>> problem for you, and yesterday's backup is good enough.
>
> In my case, the opposite is true -- the application needs to
> avoid "stumbling", if at all possible.  Going back to
> yesterday's dataset would be very painful.

Then you need to learn about WAL and point-in-time recovery.

Yours,
Laurenz Albe

[toc] | [prev] | [next] | [standalone]

#188 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-05 11:29 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iuvl71$5i9$1@speranza.aioe.org>
In reply to	#187

Hi Laurenz,

On 7/4/2011 11:58 PM, Laurenz Albe wrote:

>> And, presumably, hints only benefit *specific* dataset-query
>> combinations.  I.e., change the query and the hint's value may
>> change?
>
> Yes. A query hint is part of the query and refers to certain
> parts of it.

So, unless you have accurate, prior knowledge of the dataset's
actual complexion, you stand a chance of shooting yourself in the
foot *if* you hint (based on assumptions you make that may not
turn out to be true)

>>> Not quite, because routine vacuuming is not done by the end user
>>> who executes statements, but by an administrator. The end user will
>>> not have the necessary permissions to vacuum all tables in the database.
>
>> Yes, understood.  I'm thinking about *my* applications where there
>> is an agent interposed between the end user and the DBMS acting
>> as if the administrator.  So, it can run some commands on behalf of
>> the user and *then* decide to VACUUM (having inherent knowledge
>> of the consequences of those commands).
>
> That sounds like a really bad idea.
>
> If the database user that performs SQL on behalf of the end user
> has superuser privileges, that constitutes an unnecessary risk.
> Security holes or bugs in your software can cause much more damage
> that way.

No choice.  There is no "organic" DBA involved.  So, even if I
let autovacuum handle that aspect, there will always be other
aspects that have to be "coded" (and "Just Work")

>>> With non-automatic vacuum, the DBA can for example schedule a daily
>>> VACUUM of all tables at 11 p.m. because he or she knows that nothing
>>> much is going on at that time.
>
>> That assumes you have sufficient free store to let a day's worth
>> of "garbage" accumulate.
>
> Of course.

OK.  I can't guarantee that so it is something I will have to
be aware of.

>>> WAL is unimportant for you only then if data loss is not a
>>> problem for you, and yesterday's backup is good enough.
>>
>> In my case, the opposite is true -- the application needs to
>> avoid "stumbling", if at all possible.  Going back to
>> yesterday's dataset would be very painful.
>
> Then you need to learn about WAL and point-in-time recovery.

<grin>  I'll add it to the List of Stuff I Must Learn.

Thanks!
--don

[toc] | [prev] | [next] | [standalone]

#189 — Re: Is PostgreSQL good?

From	"Laurenz Albe" <invite@spam.to.invalid>
Date	2011-07-06 09:00 +0200
Subject	Re: Is PostgreSQL good?
Message-ID	<1309935622.273345@proxy.dienste.wien.at>
In reply to	#188

Don Y wrote:
>>> Yes, understood.  I'm thinking about *my* applications where there
>>> is an agent interposed between the end user and the DBMS acting
>>> as if the administrator.  So, it can run some commands on behalf of
>>> the user and *then* decide to VACUUM (having inherent knowledge
>>> of the consequences of those commands).

>> That sounds like a really bad idea.
>>
>> If the database user that performs SQL on behalf of the end user
>> has superuser privileges, that constitutes an unnecessary risk.
>> Security holes or bugs in your software can cause much more damage
>> that way.

> No choice.  There is no "organic" DBA involved.  So, even if I
> let autovacuum handle that aspect, there will always be other
> aspects that have to be "coded" (and "Just Work")

You can automatize administrative tasks, that's a good thing,
but that does not necessitate that the end user gets served by a
superuser account. Those things should be done separately.

You won't get away without some sort of human DBA.
You must at least ascertain that backups complete successfully
and space doesn't run out. And you need somebody who is able to
restore a backup. So at least when things go wrong, you need a DBA.

Yours,
Laurenz Albe

[toc] | [prev] | [next] | [standalone]

#190 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-06 01:00 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iv14m0$b0u$1@speranza.aioe.org>
In reply to	#189

Hi Laurenz,

On 7/6/2011 12:00 AM, Laurenz Albe wrote:
> Don Y wrote:
>>>> Yes, understood.  I'm thinking about *my* applications where there
>>>> is an agent interposed between the end user and the DBMS acting
>>>> as if the administrator.  So, it can run some commands on behalf of
>>>> the user and *then* decide to VACUUM (having inherent knowledge
>>>> of the consequences of those commands).
>
>>> That sounds like a really bad idea.
>>>
>>> If the database user that performs SQL on behalf of the end user
>>> has superuser privileges, that constitutes an unnecessary risk.
>>> Security holes or bugs in your software can cause much more damage
>>> that way.
>
>> No choice.  There is no "organic" DBA involved.  So, even if I
>> let autovacuum handle that aspect, there will always be other
>> aspects that have to be "coded" (and "Just Work")
>
> You can automatize administrative tasks, that's a good thing,
> but that does not necessitate that the end user gets served by a
> superuser account. Those things should be done separately.
>
> You won't get away without some sort of human DBA.
> You must at least ascertain that backups complete successfully
> and space doesn't run out. And you need somebody who is able to
> restore a backup. So at least when things go wrong, you need a DBA.

The intend is to deploy in *devices* -- not "servers".  For a
(crude) parallel, imagine the "contacts", etc. in your cell phone
were being managed by PostgreSQL.  I.e., a set of relations
defining people, their phone numbers, log of recent calls, etc.

[recall, I am using a cell phone solely as an analogy as the
usage patterns are similar -- largely "fixed"]

The sorts of relations are all predefined.  A (cell phone) "user"
could opt to add a name to his "contact list".  Or, delete one.
Or "edit" an existing contact.  Or, purge the log of the least
recent calls.  etc.

He might even want to add "another" contact list (named "business
contacts"), etc.

But, he's not creating any arbitrary relations.  And, the queries
he'll run are predefined -- with "fill in the blank" parameters, etc.

I.e., if PostgreSQL can't keep the relations intact, then it has
fundamental bugs (in which case, why would *anyone* be using it?).

If PostgreSQL isn't "ready for prime time", in that regard, I
have taken pains to define all my relations in such a way that I
could fall back to cruder tools (e.g., dbm-style where I do most
of the "work") and, even, a sparse VM implementation (linking
relations with actual pointers).

I have a trial, "server based" application running "unattended"
that has performed well so far (despite abuses to the server
itself).  And, one of the two projects in the works currently
takes that a step further (more traffic, users, etc.).  So,
I've got "an out" for the other project if I discover PG can't
"keep a clean house" without someone babysitting it...  :<

<shrug>  We'll see.  AFAIK, it hasn't been tried on this large
a scale previously (?)  As I said, the things I'm expecting from
the DBMS are probably different than what most users/DBA's would
expect.

--don

[toc] | [prev] | [next] | [standalone]

#191 — Re: Is PostgreSQL good?

From	"Laurenz Albe" <invite@spam.to.invalid>
Date	2011-07-07 09:05 +0200
Subject	Re: Is PostgreSQL good?
Message-ID	<1310022354.456735@proxy.dienste.wien.at>
In reply to	#190

Don Y wrote:
> The sorts of relations are all predefined.  A (cell phone) "user"
> could opt to add a name to his "contact list".  Or, delete one.
> Or "edit" an existing contact.  Or, purge the log of the least
> recent calls.  etc.
>
> He might even want to add "another" contact list (named "business
> contacts"), etc.

This is pretty vague, but it sounds like nothing you need a
superuser account for.
Keep administrative stuff (VACUUM, backups) separate from the rest.

> But, he's not creating any arbitrary relations.  And, the queries
> he'll run are predefined -- with "fill in the blank" parameters, etc.

Yup, that's exactly where the user can use SQL injection to break
into your database.

> I.e., if PostgreSQL can't keep the relations intact, then it has
> fundamental bugs (in which case, why would *anyone* be using it?).

You misunderstood me. PostgreSQL will keep its stuff consistent.
If somebody breaks into your database with a superuser account,
he or she can very consistently read and change everything in the
database, access the file system, and theoretically do anything with
your machine that the OS user has permissions to do.

> <shrug>  We'll see.  AFAIK, it hasn't been tried on this large
> a scale previously (?)  As I said, the things I'm expecting from
> the DBMS are probably different than what most users/DBA's would
> expect.

I don't want to play Kassandra here, but most of the people who
want to use a software for something else than the intended use
become quite unhappy in the end.

Yours,
Laurenz Albe

[toc] | [prev] | [next] | [standalone]

#192 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-07 00:34 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iv3nh8$9kk$1@speranza.aioe.org>
In reply to	#191

Hi Laurenz,

On 7/7/2011 12:05 AM, Laurenz Albe wrote:
> Don Y wrote:
>> The sorts of relations are all predefined.  A (cell phone) "user"
>> could opt to add a name to his "contact list".  Or, delete one.
>> Or "edit" an existing contact.  Or, purge the log of the least
>> recent calls.  etc.
>>
>> He might even want to add "another" contact list (named "business
>> contacts"), etc.
>
> This is pretty vague, but it sounds like nothing you need a
> superuser account for.

I'm not claiming to need a "privileged" account.  Rather, there is
an agent (a piece of software) that sits between the "user" and
the DBMS that interacts with the DB on behalf of the user and
reports back to the user.

So, it can run a query -- and then manually vacuum, etc.

Resources are scarce/fixed so you can't just arbitrarily do
something and let the pieces fall where they may.  Instead,
you have to interact with the DB in ways that are more
"economical" (in space/time/power/etc.).

> Keep administrative stuff (VACUUM, backups) separate from the rest.
>
>> But, he's not creating any arbitrary relations.  And, the queries
>> he'll run are predefined -- with "fill in the blank" parameters, etc.
>
> Yup, that's exactly where the user can use SQL injection to break
> into your database.

That's why there's an agent in the middle to "sanitize" all
interactions with the DBMS.  The interface to the DBMS is
never "raw".

>> I.e., if PostgreSQL can't keep the relations intact, then it has
>> fundamental bugs (in which case, why would *anyone* be using it?).
>
> You misunderstood me. PostgreSQL will keep its stuff consistent.

Then there won't be any problems.  :>

> If somebody breaks into your database with a superuser account,
> he or she can very consistently read and change everything in the
> database, access the file system, and theoretically do anything with
> your machine that the OS user has permissions to do.

Understood.  But, the same vulnerabilities exist in any PostgreSQL
instance.  Less so, here, because the device isn't sitting on a
network, "exposed".

>> <shrug>   We'll see.  AFAIK, it hasn't been tried on this large
>> a scale previously (?)  As I said, the things I'm expecting from
>> the DBMS are probably different than what most users/DBA's would
>> expect.
>
> I don't want to play Kassandra here, but most of the people who
> want to use a software for something else than the intended use
> become quite unhappy in the end.

I don't see that what we are doing is in any way different from
what a PostgreSQL user could expect from a DB.  The differences
are that our "users" are applications -- not "organic beings".
And, that there is typically only a single "organic" user
associated with each PG instance.

[toc] | [prev] | [next] | [standalone]

#185 — Re: Is PostgreSQL good?

From	Matthew Woodcraft <mattheww@chiark.greenend.org.uk>
Date	2011-07-04 20:12 +0100
Subject	Re: Is PostgreSQL good?
Message-ID	<hOp*OqiHt@news.chiark.greenend.org.uk>
In reply to	#180

In article <iul48k$9v1$1@speranza.aioe.org>, Don Y  <nowhere@here.com> wrote:
> Are those parameters effectively "hints of a different color"?
> I.e., hints regarding the makeup of the *data* themselves
> instead of a particular *query*?

As Laurenz has explained, they're not.

But as I understand it, the PostgreSQL developers who are opposed to
adding 'traditional' hints think that the kind of hint you're
describing here is worth considering.

-M-

[toc] | [prev] | [next] | [standalone]

#186 — Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Date	2011-07-04 12:24 -0700
Subject	Re: Is PostgreSQL good?
Message-ID	<iut40n$bao$1@speranza.aioe.org>
In reply to	#185

Hi Mathew,

On 7/4/2011 12:12 PM, Matthew Woodcraft wrote:
> In article<iul48k$9v1$1@speranza.aioe.org>, Don Y<nowhere@here.com>  wrote:
>> Are those parameters effectively "hints of a different color"?
>> I.e., hints regarding the makeup of the *data* themselves
>> instead of a particular *query*?
>
> As Laurenz has explained, they're not.
>
> But as I understand it, the PostgreSQL developers who are opposed to
> adding 'traditional' hints think that the kind of hint you're
> describing here is worth considering.

Because that information could be exploited by "smarter"
incarnations of the planner?

I.e., their philosophy being to let the planner do the thinking
and give it whatever information might be *helpful* to facilitate
its coming to a "better plan" and then just invest (development)
effort in *making* that "smarter planner"?

Presumably, the planner wouldn't be *obligated* to do as those
"non-hint hints" would otherwise suggest?

--don

[toc] | [prev] | [next] | [standalone]

#145

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-06-23 17:48 +0200
Message-ID	<96h5hqFo39U1@mid.individual.net>
In reply to	#139

On 06/21/2011 03:38 PM, Mladen Gogala wrote:
> On Tue, 21 Jun 2011 09:18:07 +0200, Laurenz Albe wrote:
>
>> PS: Don't let people discourage you. PostgreSQL is a good and stable
>> database system.
>
> It is stable, no contest there. Is it good or not is a matter of opinion.

Unfortunately people often forget that "good" in itself is meaningless 
and do not provide a list of criteria as you do.

> I could list several grave faults which make it less than good in my eyes.
> The first and foremost is, of course, the lack of hints. Postgres is the
> only major database system which lacks hints. The second is ludicrous
> need to "vacuum" database.

Why do you consider this bad?  I agree that it sounds silly at first 
sight that the user needs to take care of cleaning up deadwood in the 
database.  On the other hand, on an Oracle instance you'll likely also 
have periodic jobs doing statistics updates which is what VACUUMing also 
does.  Reducing the work a transaction has to do can be beneficial for 
the client, too.

> The third is the fact that Postgres doesn't do
> multi-block reads.

Isn't this alleviated by the OS doing readahead on many platforms?

> Fixed size archive logs (16M) would be the next issue.

This can be changed during compilation though.  But I agree, you'd 
rather want this to be tuneable at runtime - at least without 
recompiling the software.

> The lack of shared pool makes it impossible to see the plan of the
> statement being executed. And there is more.

That's unfortunate.  At least you can see SQL currently executed by 
querying pg_stat_activity.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#146

From	Mladen Gogala <no@email.here.invalid>
Date	2011-06-23 16:42 +0000
Message-ID	<pan.2011.06.23.16.42.52@email.here.invalid>
In reply to	#145

On Thu, 23 Jun 2011 17:48:09 +0200, Robert Klemme wrote:

> Why do you consider this bad?  I agree that it sounds silly at first
> sight that the user needs to take care of cleaning up deadwood in the
> database.  On the other hand, on an Oracle instance you'll likely also
> have periodic jobs doing statistics updates which is what VACUUMing also
> does.  Reducing the work a transaction has to do can be beneficial for
> the client, too.

No, what I consider bad is the architecture of putting dead rows in the 
table. That prevents Postgres from assigning a fixed row id to every row 
and from creating global indexes on partitioned tables. I believe that 
they should consider putting previous versions of the row in a separate 
table and have the vacuum process, let's call it "smon" for the sake of 
argument, do a periodical cleanup. That way rows in the table are not a 
moving target, there is a possibility of assigning a permanent row id, 
which would also allow creating global indexes.

> 
>> The third is the fact that Postgres doesn't do multi-block reads.
> 
> Isn't this alleviated by the OS doing readahead on many platforms?

Weeellllll....it is. You are right, my objection is mostly of aesthetic 
nature. When I come to think of it, it does simplify the optimizer 
because there is only one type of I/O to think about.

> 
>> Fixed size archive logs (16M) would be the next issue.
> 
> This can be changed during compilation though.  But I agree, you'd
> rather want this to be tuneable at runtime - at least without
> recompiling the software.
> 
>> The lack of shared pool makes it impossible to see the plan of the
>> statement being executed. And there is more.
> 
> That's unfortunate.  At least you can see SQL currently executed by
> querying pg_stat_activity.
> 
> Kind regards

Oh yes, and there is also auto explain module which will dump the plan of 
any statement running longer than a predefined period of time (tunable). 
What I really miss is the ability to click on the session in PgAdmin3 and 
see the plan of a running statement. Also, sorting by the CPU time or the 
number of I/O requests issued would be nice.

-- 
http://mgogala.byethost5.com

[toc] | [prev] | [next] | [standalone]

#147

From	Mladen Gogala <no@email.here.invalid>
Date	2011-06-23 18:57 +0000
Message-ID	<pan.2011.06.23.18.57.33@email.here.invalid>
In reply to	#146

On Thu, 23 Jun 2011 16:42:52 +0000, Mladen Gogala wrote:

> I believe that
> they should consider putting previous versions of the row in a separate
> table

What I meant is "a separate file".



-- 
http://mgogala.byethost5.com

[toc] | [prev] | [next] | [standalone]

#153

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-06-26 17:35 +0200
Message-ID	<96p1t7Fcg6U1@mid.individual.net>
In reply to	#146

On 06/23/2011 06:42 PM, Mladen Gogala wrote:
> On Thu, 23 Jun 2011 17:48:09 +0200, Robert Klemme wrote:
>
>> Why do you consider this bad?  I agree that it sounds silly at first
>> sight that the user needs to take care of cleaning up deadwood in the
>> database.  On the other hand, on an Oracle instance you'll likely also
>> have periodic jobs doing statistics updates which is what VACUUMing also
>> does.  Reducing the work a transaction has to do can be beneficial for
>> the client, too.
>
> No, what I consider bad is the architecture of putting dead rows in the
> table. That prevents Postgres from assigning a fixed row id to every row
> and from creating global indexes on partitioned tables. I believe that
> they should consider putting previous versions of the row in a separate
> table and have the vacuum process, let's call it "smon" for the sake of
> argument, do a periodical cleanup. That way rows in the table are not a
> moving target, there is a possibility of assigning a permanent row id,
> which would also allow creating global indexes.

Aha, thanks for clarifying!  I think I need to read up a bit on 
PostgreSQL architecture.  Generally both approaches (copy on write, 
backup old version) work and PostgreSQL simply did choose another way to 
do it than Oracle.  I am not sure yet which one is better.  When I think 
about infamous ORA-01555 I would at least say that Oracle's approach has 
caused its share of grief for DB users. :-)  Which approach is best 
certainly depends on usage patterns (how much concurrency, ratio of read 
to write activity, inserts vs. deletes vs. updates etc.).

>>> The third is the fact that Postgres doesn't do multi-block reads.
>>
>> Isn't this alleviated by the OS doing readahead on many platforms?
>
> Weeellllll....it is. You are right, my objection is mostly of aesthetic
> nature. When I come to think of it, it does simplify the optimizer
> because there is only one type of I/O to think about.

:-)

>>> Fixed size archive logs (16M) would be the next issue.
>>
>> This can be changed during compilation though.  But I agree, you'd
>> rather want this to be tuneable at runtime - at least without
>> recompiling the software.
>>
>>> The lack of shared pool makes it impossible to see the plan of the
>>> statement being executed. And there is more.
>>
>> That's unfortunate.  At least you can see SQL currently executed by
>> querying pg_stat_activity.
>
> Oh yes, and there is also auto explain module which will dump the plan of
> any statement running longer than a predefined period of time (tunable).
> What I really miss is the ability to click on the session in PgAdmin3 and
> see the plan of a running statement. Also, sorting by the CPU time or the
> number of I/O requests issued would be nice.

So it's rather an issue of the administration UI and not of the database 
itself.  It wasn't that long ago that even Oracle did not have a 
graphical UI for administration - IIRC it was Oracle 10 which packaged 
Enterprise Manager with the DB by default the first time.  At that time 
I still considered Microsoft SQL Server's tools better.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#154

From	Mladen Gogala <gogala.mladen@gmail.com>
Date	2011-06-26 19:08 +0000
Message-ID	<pan.2011.06.26.19.08.18@gmail.com>
In reply to	#153

On Sun, 26 Jun 2011 17:35:02 +0200, Robert Klemme wrote:

> IIRC it was Oracle 10 which packaged
> Enterprise Manager with the DB by default the first time.  

Yes, but there were both commercial and freeware monitors like Karma, 
Tora, Oracletool and others. There is nothing like that for Postgres. The 
reason for that is that there cannot be because Postgres doesn't store 
the execution plan in a shared location where it can be picked up and 
displayed. There is no way of seeing the execution plan of a running SQL 
statement.

> At that time I still considered Microsoft SQL Server's tools better.

I still do.

-- 
http://mgogala.byethost5.com

[toc] | [prev] | [next] | [standalone]

#155

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-06-27 00:32 -0700
Message-ID	<225b3c7a-66d2-459d-ba55-4d202a780433@n5g2000yqh.googlegroups.com>
In reply to	#146

On 23 Jun., 18:42, Mladen Gogala <n...@email.here.invalid> wrote:
> On Thu, 23 Jun 2011 17:48:09 +0200, Robert Klemme wrote:
> > Why do you consider this bad?  I agree that it sounds silly at first
> > sight that the user needs to take care of cleaning up deadwood in the
> > database.  On the other hand, on an Oracle instance you'll likely also
> > have periodic jobs doing statistics updates which is what VACUUMing also
> > does.  Reducing the work a transaction has to do can be beneficial for
> > the client, too.
>
> No, what I consider bad is the architecture of putting dead rows in the
> table. That prevents Postgres from assigning a fixed row id to every row
> and from creating global indexes on partitioned tables. I believe that
> they should consider putting previous versions of the row in a separate
> table and have the vacuum process, let's call it "smon" for the sake of
> argument, do a periodical cleanup. That way rows in the table are not a
> moving target, there is a possibility of assigning a permanent row id,
> which would also allow creating global indexes.

I think we need to separate the ROWID issue from having global indexes
for partitioned tables.  I can think of ways to make a global index
work without the existence of ROWID.  All it would need would be a PK
per partition and then the global index can use the partition key and
the local PK to find a row.  Granted, that would add logical read
operations, but: when partitioning tables you typically do it to gain
efficiency during querying by partition pruning and to make management
of the data easier (e.g. dropping an obsolete partition).  So a lookup
which does not use partition pruning should be avoided anyway (or:
done rarely) so the overall price paid might not be too high.

For completeness reasons, there's also HOT to deal more efficiently
with updates:
http://pgsql.tapoueh.org/site/html/misc/hot.html

Kind regards

robert

[toc] | [prev] | [next] | [standalone]

#160

From	Mladen Gogala <no@email.here.invalid>
Date	2011-06-28 16:00 +0000
Message-ID	<pan.2011.06.28.16.00.38@email.here.invalid>
In reply to	#155

On Mon, 27 Jun 2011 00:32:44 -0700, Robert Klemme wrote:

> I think we need to separate the ROWID issue from having global indexes
> for partitioned tables.  I can think of ways to make a global index work
> without the existence of ROWID.

Well, it would be presumptuous of me to claim otherwise. Having a global 
ROWID does look like a simple solution to the problem, but there probably 
are other solutions. Unfortunately, none of those exist in the world of 
Postgres yet. Neither does this: http://tinyurl.com/38fgjwg

-- 
http://mgogala.byethost5.com

[toc] | [prev] | [next] | [standalone]

#161

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-06-28 22:28 +0200
Message-ID	<96urqqFg22U1@mid.individual.net>
In reply to	#160

On 28.06.2011 18:00, Mladen Gogala wrote:
> On Mon, 27 Jun 2011 00:32:44 -0700, Robert Klemme wrote:
>
>> I think we need to separate the ROWID issue from having global indexes
>> for partitioned tables.  I can think of ways to make a global index work
>> without the existence of ROWID.
>
> Well, it would be presumptuous of me to claim otherwise. Having a global
> ROWID does look like a simple solution to the problem, but there probably
> are other solutions. Unfortunately, none of those exist in the world of
> Postgres yet. Neither does this: http://tinyurl.com/38fgjwg

That's parallel query in MySQL, isn't it?  This looks good.  I think I 
agree with you: for really huge datasets there should be some built in 
parallelism in order to be able to truly use all IO bandwidth available 
and not get limited by a single core.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [standalone]

Page 3 of 3 — ← Prev page 1 2 [3]

csiph-web

migrating oracle to postgres

Contents

#179 — Re: Is PostgreSQL good?

#180 — Re: Is PostgreSQL good?

#183 — Re: Is PostgreSQL good?

#184 — Re: Is PostgreSQL good?

#187 — Re: Is PostgreSQL good?

#188 — Re: Is PostgreSQL good?

#189 — Re: Is PostgreSQL good?

#190 — Re: Is PostgreSQL good?

#191 — Re: Is PostgreSQL good?

#192 — Re: Is PostgreSQL good?

#185 — Re: Is PostgreSQL good?

#186 — Re: Is PostgreSQL good?

#145

#146

#147

#153

#154

#155

#160

#161