Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.databases > #516

Re: Searching for ranges

From David Lee Lambert <davidl@lmert.com>
Newsgroups comp.lang.java.databases
Subject Re: Searching for ranges
Date 2012-01-17 11:59 -0800
Organization http://groups.google.com
Message-ID <50f287a6-be60-4185-a5ba-2aed287e91e2@z8g2000pbe.googlegroups.com> (permalink)
References <nqukc7ta1mrd128h8tqbabo9k9e76eq7ak@4ax.com>

Show all headers | View raw


On 21 nov 2011, 11:56, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:
> I was wondering about techniques for using SQL to find records with a
> low-high associated range, especially when the ranges are contiguous, [...]
>
> I am familiar with in-RAM technique such as binary search and dividing
> by the range size/atomicity to get an index.
>
> I suspect just asking
> for  table.low <= wanted < table.high
> won't be that clever, even if low and high are indexed.

As long as low OR high is indexed, and assuming the ranges are small
enough that it is actually better than a full-table scan, any good
DBMS will make use of the index to speed up the query without much
trickery.  In fact, if the set of ranges is small enough,  the whole
index will probably be in one B-tree page,  with the keys in order in
memory, and the DBMS will actually use a binary search to look values
up within the page.

If you have a sequence of contiguous non-overlapping ranges,  you
don't need two columns;  you just need to "SELECT ... WHERE table.low
<= wanted ORDER BY table.low ASC LIMIT 1" or "SELECT ... WHERE
table.low <= wanted AND table.id NOT IN (SELECT id FROM table t2 WHERE
t2.low >= wanted)" or any other way of expressing it in SQL.

However,  I wonder if putting such a table out in the database is
really the most robust, maintainable or high-performance way to deal
with the applications you list lower down...

> The pattern of cases defined by bounds, particularly when the bounds
> are contiguous comes up quite frequently.  I am surprised there are
> not more built in features in languages, Collections and databases to
> efficiently handle it.
>
> Some places where this band-searching problem comes up:
>
> converting a random number 0 <= r < 1  to select a weighted quotation.

Yes.

> deciding which unit of measure to use depending on its size.

For this I think an array lookup by  "Math.floor(Math.log(x)/
LOG_OF_1000)" is simpler and faster than a network round-trip (or
local context switch) to the database.  Of course,  if you're writing
database stored procedures and you expect the unit names to actually
change anytime,  that might make a database table more attractive...

> deciding which rule to use for inserting dashes in a ISBN depending on
> is [?] band.

The rules for this fit on a single page of human-readable text,  I
don't see how anything involving SQL parsing and a database lookup
could possibly be more efficient than just implementing them in a
procedural language.

See for example  http://waltshiel.com/2008/06/22/isbn-how-to-decode-it/
.

> UTF-8 encoding

The rules for this are even simpler,  and we hope already solved as
part of Java NIO.  The only thing to tweak is whether one-byte, two-
byte, three-byte, or four-byte sequences are expected to be most
common, which might affect what optimization hints you pass to a
highly optimizing C compiler.

> calculations with coins.

There I'll grant you it could change at runtime for a sufficiently
long-running program; but I still suspect that the most practically
efficient algorithms will start by loading  all the coin-values for a
particular currency into memory at the beginning of a calculation.

> naming colours

If you're thinking of loading X's "rgb.txt" and computing the closest
match to a color,  etc... Several databases support 2-D geometric
types with corresponding indexes.  However, I can't find documentation
of any mainstream database with out-of-the-box support for 3-D
indexes.  See the following comment about PostGIS...

http://www.mail-archive.com/postgis-users@postgis.refractions.net/msg03821.html

--
DLL

Back to comp.lang.java.databases | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Searching for ranges Roedy Green <see_website@mindprod.com.invalid> - 2011-11-21 08:56 -0800
  Re: Searching for ranges Martin Gregorie <martin@address-in-sig.invalid> - 2011-11-21 21:18 +0000
  Re: Searching for ranges David Lee Lambert <davidl@lmert.com> - 2012-01-17 11:59 -0800
    Re: Searching for ranges Roedy Green <see_website@mindprod.com.invalid> - 2012-01-19 09:32 -0800

csiph-web