Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.databases > #516
| From | David Lee Lambert <davidl@lmert.com> |
|---|---|
| Newsgroups | comp.lang.java.databases |
| Subject | Re: Searching for ranges |
| Date | 2012-01-17 11:59 -0800 |
| Organization | http://groups.google.com |
| Message-ID | <50f287a6-be60-4185-a5ba-2aed287e91e2@z8g2000pbe.googlegroups.com> (permalink) |
| References | <nqukc7ta1mrd128h8tqbabo9k9e76eq7ak@4ax.com> |
On 21 nov 2011, 11:56, Roedy Green <see_webs...@mindprod.com.invalid> wrote: > I was wondering about techniques for using SQL to find records with a > low-high associated range, especially when the ranges are contiguous, [...] > > I am familiar with in-RAM technique such as binary search and dividing > by the range size/atomicity to get an index. > > I suspect just asking > for table.low <= wanted < table.high > won't be that clever, even if low and high are indexed. As long as low OR high is indexed, and assuming the ranges are small enough that it is actually better than a full-table scan, any good DBMS will make use of the index to speed up the query without much trickery. In fact, if the set of ranges is small enough, the whole index will probably be in one B-tree page, with the keys in order in memory, and the DBMS will actually use a binary search to look values up within the page. If you have a sequence of contiguous non-overlapping ranges, you don't need two columns; you just need to "SELECT ... WHERE table.low <= wanted ORDER BY table.low ASC LIMIT 1" or "SELECT ... WHERE table.low <= wanted AND table.id NOT IN (SELECT id FROM table t2 WHERE t2.low >= wanted)" or any other way of expressing it in SQL. However, I wonder if putting such a table out in the database is really the most robust, maintainable or high-performance way to deal with the applications you list lower down... > The pattern of cases defined by bounds, particularly when the bounds > are contiguous comes up quite frequently. I am surprised there are > not more built in features in languages, Collections and databases to > efficiently handle it. > > Some places where this band-searching problem comes up: > > converting a random number 0 <= r < 1 to select a weighted quotation. Yes. > deciding which unit of measure to use depending on its size. For this I think an array lookup by "Math.floor(Math.log(x)/ LOG_OF_1000)" is simpler and faster than a network round-trip (or local context switch) to the database. Of course, if you're writing database stored procedures and you expect the unit names to actually change anytime, that might make a database table more attractive... > deciding which rule to use for inserting dashes in a ISBN depending on > is [?] band. The rules for this fit on a single page of human-readable text, I don't see how anything involving SQL parsing and a database lookup could possibly be more efficient than just implementing them in a procedural language. See for example http://waltshiel.com/2008/06/22/isbn-how-to-decode-it/ . > UTF-8 encoding The rules for this are even simpler, and we hope already solved as part of Java NIO. The only thing to tweak is whether one-byte, two- byte, three-byte, or four-byte sequences are expected to be most common, which might affect what optimization hints you pass to a highly optimizing C compiler. > calculations with coins. There I'll grant you it could change at runtime for a sufficiently long-running program; but I still suspect that the most practically efficient algorithms will start by loading all the coin-values for a particular currency into memory at the beginning of a calculation. > naming colours If you're thinking of loading X's "rgb.txt" and computing the closest match to a color, etc... Several databases support 2-D geometric types with corresponding indexes. However, I can't find documentation of any mainstream database with out-of-the-box support for 3-D indexes. See the following comment about PostGIS... http://www.mail-archive.com/postgis-users@postgis.refractions.net/msg03821.html -- DLL
Back to comp.lang.java.databases | Previous | Next — Previous in thread | Next in thread | Find similar
Searching for ranges Roedy Green <see_website@mindprod.com.invalid> - 2011-11-21 08:56 -0800
Re: Searching for ranges Martin Gregorie <martin@address-in-sig.invalid> - 2011-11-21 21:18 +0000
Re: Searching for ranges David Lee Lambert <davidl@lmert.com> - 2012-01-17 11:59 -0800
Re: Searching for ranges Roedy Green <see_website@mindprod.com.invalid> - 2012-01-19 09:32 -0800
csiph-web