Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!newsfeed.utanet.at!newscore.univie.ac.at!aconews-feed.univie.ac.at!aconews.univie.ac.at!not-for-mail
From: "Laurenz Albe" <invite@spam.to.invalid>
Newsgroups: comp.databases.postgresql
References: <25ee8bf2-869c-427b-a4a4-5faf86f92d57@o26g2000vbi.googlegroups.com><WS*So6Lt@news.chiark.greenend.org.uk> <pan.2011.08.31.23.00.27@gmail.com><1314860211.105064@proxy.dienste.wien.at><pan.2011.09.01.12.57.52@gmail.com><1314888079.999065@proxy.dienste.wien.at> <pan.2011.09.01.21.26.13@gmail.com>
Subject: Re: Case-insensitive select
Date: Fri, 2 Sep 2011 10:08:19 +0200
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
Organization: MagistratWien newsserver
Message-Id: <1314950920.382293@proxy.dienste.wien.at>
X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/)
X-Cache: nntpcache 2.3.3 (see http://www.nntpcache.org/)
Lines: 40
NNTP-Posting-Host: 141.203.254.23
X-Trace: 1314950924 aconews.univie.ac.at 38260 141.203.254.23
X-Complaints-To: abuse@univie.ac.at
Xref: x330-a1.tempe.blueboxinc.net comp.databases.postgresql:228

Mladen Gogala wrote:
> It seems to me that citext is a good replacement for the type text, not
> for the type varchar(n).

Well, it's called "citext" and not "civarchar" for a reason.
But you can easily

CREATE DOMAIN civarchar20 AS citext
   CONSTRAINT civarchar20_len CHECK(length(VALUE) <= 20);

to get that.

>                I would still use  the standard varchar type for the
> vast majority of cases and use trigger and/or functional indexes to
> resolve the case insensitive search.

It's everybody's choice.

>           The trick to make it perform better was to add a column,
> named something like lower_col1 and populate it by a trigger. That would
> help the optimizer, at the expense of violating the relational rules and
> storing the duplicate information. That column could then be indexed
> normally. This method still provides the best performance, on all
> databases I know of, with the notable exception of Oracle 11i.

Of course it also wastes space space, and triggers don't come for
free either.

I doubt that using an index based on an expression would perform worse
in PostgreSQL, but I don't have the time to run a performance test.

At any rate statistics for such indexes are collected (look for rows
in pg_stats that belong to the index), and these statistice are exactly
the same as would be collected for the redundant column, so why should
there be a difference?

Yours,
Laurenz Albe