Groups > comp.databases.postgresql > #180

Re: Is PostgreSQL good?

From	Don Y <nowhere@here.com>
Newsgroups	comp.databases.postgresql
Subject	Re: Is PostgreSQL good?
Date	2011-07-01 11:39 -0700
Organization	Aioe.org NNTP Server
Message-ID	<iul48k$9v1$1@speranza.aioe.org> (permalink)
References	<ito056$ido$1@bruford.hrz.tu-chemnitz.de><1308640710.210659@proxy.dienste.wien.at><pan.2011.06.21.13.38.10@gmail.com><1308738638.232318@proxy.dienste.wien.at> <pan.2011.06.22.12.02.22@gmail.com> <1308911383.819034@proxy.dienste.wien.at> <iuhvrv$5j7$1@speranza.aioe.org> <1309508018.14415@proxy.dienste.wien.at>

Show all headers | View raw

Hi Laurenz,

On 7/1/2011 1:13 AM, Laurenz Albe wrote:
> Don Y wrote:
>> [apologies for not having the technical depth to understand these
>> issues better -- I just use PostgreSQL as a "means to an end"...
>> I could use any similar "means" but this seems to work well enough
>> for me]
>
> No worries, but don't take it as rudeness if I give you links for
> some of these question instead of answering everything in detail.

Understood.

>> Can someone please, *gently*, explain *quantitatively* the differences
>> that these issues make in performance?
>
> Quantifying performance is only possible in a narrowly defined setting
> (in my opinion).

Of course.  I was trying to get an idea as to what was "at
stake".

E.g., a peephole optimizer (in a compiled language) makes very
small improvements -- changing, for example:
    store foo
    load foo
into the equivalent:
    store foo
(since foo is already loaded, no need to REload it).  The
savings are typically insignificant unless in a very *tight*
loop where the cost of that *single* saved instruction represents
a large portion of the cost of a single iteration.

OTOH, rewriting:
     if ((count_all_people_in_the_world() && (today != weekday)) ...
as:
     if ((today != weekday) && (count_all_people_in_the_world()) ...
provides *big* savings as count_all_people_in_the_world() isn't
done *most* days (because today is more often a weekday than a
NONweekday!).

>> Are hints on a par with the 'register' keyword in C?  I.e.,
>> something you *hope* gives the compiler an insight into your
>> algorithm -- but, that the compiler is free to *ignore*?
>> Are there consequences to using a "hint" (e.g., using 'register'
>> places constraints on what you can do)?
>
> Roughly yes. Except that (at least in Oracle) a hint will only
> be ignored if it is impossible to use (you cannot force an index
> that cannot be used for this query).

So, if the hint is "a bad idea", the DBMS won't ignore it, even
if it "knows better"?  I.e., the DBA can *worsen* performance
by hinting incorrectly?

>> Is the "hint" concept essentially a workaround to compensate for
>> capabilities *missing* in the optimizer?  I.e., as C compilers
>> get smarter, they deliberately *choose* to ignore the 'register'
>> keyword because they actually have *more* insight than the
>> programmer!
>
> Yes, it is a workaround for optimizer deficiencies.
> The problem here is that optimizing SQL statements is quite difficult
> because SQL is a declarative language, i.e. you tell the system what
> you want to get and not how to get it. The optimizer will have to
> figure out an execution path.

Understood.  The "hint" plays the same intended role as "register"
in C -- "I, the human, know more about this than you, the computer,
so take my advice and do *this*...".  But, in C, the compiler
can effectively say, "Silly Rabbit, no you *don't*!"

>> Are there ways (without using a "hint") to coerce the database
>> to perform in the way your hint would *hope* to have it behave?
>> (e.g., can you rewrite the SQL to cause the behavior you seek?)
>
> Sometimes, yes. Often a simplified statement will be easier
> to optimize. If you know the optimizer well, you can also trick
> it into not considering certain optimizations.
> The optimizer uses "statistics" which are data about the number
> and distribution of data in table columns. A lot of times the
> optimizer will perform better if you collect more data.
> You can also set database parameters that influence the optimizer.

Are those parameters effectively "hints of a different color"?
I.e., hints regarding the makeup of the *data* themselves
instead of a particular *query*?

>> What sort of *quantitative* difference can this make (example?)
>> that can't, otherwise, be attained?
>
> The effect of choosing a better plan can improve things *a lot*.

Oh.  Are these situations likely to creep in often and *not*
be obvious to the DBA ahead of time?  I.e., why would he
know to "hint" at them yet the optimizer not be able to
foresee the same issue(s)?

> If you fetch 10 million rows, only to discard them later because
> you join with an empty result set, that's *much* worse than not
> fetching anything at all.

Again, why would the DBA know this is likely to be the case
but the optimizer (planner) *doesn't*?

>>> About vacuum:
>>> =============
>>>
>>> Of course vacuum is a pain, but every database system has to pay a similar
>>> price somewhere. Either you reduce concurrency (DB2), or (if you use some kind
>>> of MVCC) you can move dead rows somewhere else (Oracle) or leave them where
>>> they are and clean up later.
>>
>> Is this just the difference between automatic garbage collection
>> and "manual" garbage collection (again, relying on parallels to
>> other "languages")?
>
> No. DB2 does not create garbage, because there is only one version
> of each row at any given time. To get that, a writer has to wait
> until a reader is done (i.e., the reader will take a "read lock"
> on the row he or she is reading).

Understood.  There is, therefore, no need to vacuum.

> Oracle and PostgreSQL both create garbage, because they both keep
> old versions of rows around to avoid readers blocking writers.
> Oracle collects the garbage automatically *while* it executes
> SQL statements.
>
> PostgreSQL collects garbage at some later time, either manually
> by an administrator running the VACUUM command, or automatically
> if the autovacuum daemon is turned on.

That was what I *thought* (intuitively) the role of VACUUM was.
So, *I*, knowing what I have done and am likely to do *next*,
can decide (if auto is off) whether it behooves me to vacuum
*now* or "wait until after this next -- which will be my
LAST -- query, at which time the DB will be idle"

>> Does the parallel maintain throughout operation or does it fall
>> apart at the extremes?  E.g., if garbage collection must be
>> manually initiated (in a language), then failing to do so
>> can eventually cause you to run out of heap, etc. (which
>> causes the application to "fail").  Is the same true when
>> it comes to vacuuming -- or, does performance just suffer?
>> (i.e., will you ever "fill the disk" with dead rows if you
>> fail to initiate a VACUUM operation?)
>
> All of this is only relevant if you have autovacuum turned off
> (autovacuum is the recommended way these days).

Understood.

> Without autovacuum and if VACUUM is never run, two things will
> happen:
> 1) Your tables and indexes will grow on every insert and update,
>     and deletes and updates will create garbage that cannot be
>     reused. That can fill the disk and drastically increase the
>     duration of table and index scans.
> 2) You will at some point run out of transaction IDs, and the
>     system will shutdown to prevent data loss.

So, it's just like turning off the GC and waiting for the
free store to be exhausted.  Sooner or later, there's no
more "memory" available and a memory allocation request
fails -- causing the application to fail at that point
(though the "recovery" routine could simply manually
initiate GC at that point and rerun the request).

> See
> http://www.postgresql.org/docs/current/static/routine-vacuuming.html
> for details.
>
>>> ROWID:
>>> ======
>>>
>>> You make the point that because of VACUUM there is nothing compared to
>>> Oracle's ROWID. But do you really need that? Shouldn't a row be referenced
>>> by its primary key?
>>
>> The last question seems to be my understanding of how things should
>> work. (?)  (actually, one of the harder ideas for me to get accustomed
>> to in this field  :<  )
>
> Right. It was a rethorical question.
>
> To digress, I always find a library a good analogy for a database.
> You have a lot of books there, each has a title and an author.
>
> A primary key is a) an index (like an index in a library) and
> b) a rule that says that there may only be one book for every
> possible index entry.
>
> Without a primary key, one author could have more than one book with
> the same title, and you won't be able to tell them apart (even though
> the books themselves may be different).

Understood.  I tend to create an "ID" field in each table to
act as a primary key (because there often is no other way
to differentiate entries/rows -- and, because this maps
nicely to how I think of "pointers" in my code :> )

>> How does a ROWID differ from using an OID?  (which I presume is
>> not a good thing)
>
> The ROWID is the physical location of the row, basically the offset
> in the file. You can sort of just "lseek(3)" there and get it. This
> is the fastest way to access a row.
>
> An OID is a number from a system counter that *can* also be used
> as a primary key. To some extent, it's just a more old-fashioned
> way of what can be done better with a sequence these days.
> If used as a primary key, it is no faster than other primary keys.

Ah!  Excellent!  That makes the performance aspect very clear.
The OID can be changed, arbitrarily, while changing the ROWID
requires "physically" moving the record.

> See the final paragraphs of
> http://www.postgresql.org/docs/current/static/ddl-system-columns.html
> for details.
>
>>> Fixed size of WAL segments:
>>> ===========================
>>>
>>> Maybe I am dense, but what is the problem there?
>>
>> <grin>   I don't even know what they *are*!  ;-)
>
> WAL stands for "write ahead log" and is a transaction log.
> It is a way to provide durability for committed transactions in case
> of a system failure.

OK.  So, it's just another "missing feature" -- like buying a car
without air conditioning (not necessary if you live in a cold
climate)

> See
> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
> for details.

Thanks for taking the time (and pointers) to shed some light
on this.  Eventually, the WAL might be significant for me but
I doubt hints would ever be as, at a minimum, they would require
me to know a lot more about how the planner worked and how to
convince it to "do something else".

--don

Back to comp.databases.postgresql | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

migrating oracle to postgres Username <user@domain.tld> - 2011-06-20 19:28 +0200
  Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-20 20:08 +0000
    Re: migrating oracle to postgres Username <user@domain.tld> - 2011-06-20 23:45 +0200
      Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-20 22:37 +0000
      Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-21 02:01 +0000
  Re: migrating oracle to postgres "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-21 09:18 +0200
    Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-21 13:38 +0000
      Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-22 12:30 +0200
        Re: Is PostgreSQL good? Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-22 13:15 +0200
        Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-22 12:02 +0000
          Re: Is PostgreSQL good? "M. Strobel" <sorry_no_mail_here@nowhere.dee> - 2011-06-22 21:52 +0200
            Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-22 22:43 +0000
          Re: Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-24 12:29 +0200
            Re: Is PostgreSQL good? Hans Castorp <REWYRLXHEGHO@spammotel.com> - 2011-06-24 12:57 +0200
              Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-24 14:53 +0200
              Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-24 16:32 +0000
            Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-24 13:53 +0000
              Re: Is PostgreSQL good? (was: migrating oracle to postgres) "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-27 11:17 +0200
                Re: Is PostgreSQL good? (was: migrating oracle to postgres) Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-28 06:11 +0000
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-28 10:34 +0200
                Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-28 13:30 +0000
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-29 09:50 +0200
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-29 21:38 +0000
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-06-30 09:14 +0200
                Re: Is PostgreSQL good? Mladen Gogala <mgogala@no.address.invalid> - 2011-06-30 13:22 +0000
            Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 07:05 -0700
              Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 15:41 +0000
                Re: Is PostgreSQL good? Richard Kettlewell <rjk@greenend.org.uk> - 2011-06-30 16:49 +0100
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 19:24 +0000
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 13:21 -0700
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-06-30 22:42 +0000
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-01 11:57 -0700
                Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 22:58 +0000
                Re: Is PostgreSQL good? Richard Kettlewell <rjk@greenend.org.uk> - 2011-07-01 14:07 +0100
                Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 13:29 +0000
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-06-30 13:17 -0700
                Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-30 23:53 +0000
              Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-01 10:13 +0200
                Re: Is PostgreSQL good? Mladen Gogala <gogala.mladen@gmail.com> - 2011-07-01 13:38 +0000
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-01 16:58 +0200
                Re: Is PostgreSQL good? Mladen Gogala <no@email.here.invalid> - 2011-07-01 15:33 +0000
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-01 11:39 -0700
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-04 16:39 +0200
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-04 12:11 -0700
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-05 08:58 +0200
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-05 11:29 -0700
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-06 09:00 +0200
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-06 01:00 -0700
                Re: Is PostgreSQL good? "Laurenz Albe" <invite@spam.to.invalid> - 2011-07-07 09:05 +0200
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-07 00:34 -0700
                Re: Is PostgreSQL good? Matthew Woodcraft <mattheww@chiark.greenend.org.uk> - 2011-07-04 20:12 +0100
                Re: Is PostgreSQL good? Don Y <nowhere@here.com> - 2011-07-04 12:24 -0700
      Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-23 17:48 +0200
        Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-23 16:42 +0000
          Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-23 18:57 +0000
          Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-26 17:35 +0200
            Re: migrating oracle to postgres Mladen Gogala <gogala.mladen@gmail.com> - 2011-06-26 19:08 +0000
          Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-27 00:32 -0700
            Re: migrating oracle to postgres Mladen Gogala <no@email.here.invalid> - 2011-06-28 16:00 +0000
              Re: migrating oracle to postgres Robert Klemme <shortcutter@googlemail.com> - 2011-06-28 22:28 +0200

csiph-web