Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.lang.java.programmer > #3828

Re: SQL Puzzle - too many dimensions

From Robert Klemme <shortcutter@googlemail.com>
Newsgroups comp.lang.java.programmer
Subject Re: SQL Puzzle - too many dimensions
Date 2011-05-08 23:37 +0200
Message-ID <92ogohFnodU1@mid.individual.net> (permalink)
References <tnibs6hnfdp2cav0u74d8heeqsdkvg1c6a@4ax.com> <3Boxp.9547$HF3.5013@newsfe03.iad> <92nbkrFlrdU2@mid.individual.net> <Iyxxp.62513$0s5.452@newsfe17.iad>

Show all headers | View raw


On 08.05.2011 16:14, Arved Sandstrom wrote:
> On 11-05-08 08:03 AM, Robert Klemme wrote:
>> On 08.05.2011 06:02, Arved Sandstrom wrote:
>>> On 11-05-07 08:13 PM, Zapanaz wrote:
>>>>
>>>> I know this is a java group, not a SQL group, I am trying to bounce
>>>> this off as many heads as I can and most Java programmers I know are
>>>> good at SQL.
>>>>
>>>> So I've got a table, song_versions, in an Oracle database. It has
>>>> columns like this
>>>>
>>>> SONG_ID | SONG_VERSION_ID | RATING |
>>>>
>>>> SONG_ID is a key into the table songs. A song could be something like
>>>> this, 'Money (That's What I Want)' by Berry Gordie
>>>>
>>>> http://www.youtube.com/watch?v=z6xkT7FMyTc
>>>>
>>>> And I have three versions of this song, one by Barrett Strong, one by
>>>> The Beatles, and one by The Rolling Stones.
>>>>
>>>> I also have another song, 'Butterfly Caught' by Massive Attack. I only
>>>> have the one version of this song.
>>>>
>>>> So my table looks like this
>>>>
>>>> SONG_ID | SONG_VERSION_ID | RATING |
>>>> Money (That's What I Want) | Barrett Strong | 3 |
>>>> Money (That's What I Want) | Rolling Stones | 2 |
>>>> Money (That's What I Want) | Beatles | 1 |
>>>> Butterfly Caught | Massive Attack | 1 |
>>>>
>>>> The actual values for the ID columns in the table are integers, this
>>>> is just for clarity.
>>>>
>>>> My job is to come up with one SQL query which will return the
>>>> highest-rated version of each song.
>>>>
>>>> So for Money, the highest  rated version is the Barrett Strong
>>>> version. For Butterfly Caught, there is only one version, so I want to
>>>> return that, the Massive Attack version. So I would want my result to
>>>> be the Barrett Strong version of Money and the Massive Attack version
>>>> fo Butterfuly Caught.
>>>>
>>>> I can't think of a way to do it.
>>>>
>>>> To get the MAX(Rating), I would have to GROUP BY song_id. So I can get
>>>> the numerical highest rating for each song, I can get a
>>>>
>>>> song_id | rating
>>>>
>>>> result set, but what I need is the song_version_id corresponding to
>>>> that rating.
>>>>
>>>> Or I don't know, maybe something like
>>>>
>>>> select song_version_id
>>>>    from song_versions
>>>>    where rating =
>>>>     (select max(rating)
>>>>       from song_versions
>>>>       where song_id = [??? I don't know ... the same song ID I'm talking
>>>> about in the outer query]
>>>>     )
>>>>
>>>> If I group by song_id, I lose the song_version_id.
>>>>
>>>> (Some DBs maybe I could cheat and return columns that aren't group by
>>>> columns, but oracle won't let me pull that one.
>>>>
>>>> I just tried this in MySql
>>>>
>>>> select song_id, song_version_id, max(rating)
>>>>    from song_versions
>>>>    group by song_id
>>>>
>>>> and it worked, I got the Barrett Strong song_version_id, but then
>>>> there's nothing in my query that actually gaurantees that result, so I
>>>> don't know if I would want to depend on that in real code.
>>>>
>>>> But anyway, this is in an Oracle DB, and they don't let you include
>>>> columns in a group-by query that aren't explicitly group-by that way.
>>>> )
>>>>
>>> You can try something like
>>>
>>> select s.song_key, s.rating
>>> from (
>>>      select song_key, max(rating) max_rating
>>>      from songs group by song_key
>>> ) mx inner join songs s
>>> on s.song_key = mx.song_key and s.rating = mx.max_rating;
>>
>> Just an additional note: This is only necessary if you want to pull more
>> columns from s.  If query is only interested in s.song_key, s.rating you
>> only need the inline view's query.
>
> That's correct - thanks for the elaboration. The full SQL I provided
> above is as you said, more general purpose.

Actually I (or we) overlooked that OP wanted the song_version_id - that 
would be one additional column.  So that then would be

select s.song_version_id, s.rating
from (
      select song_id, max(rating) max_rating
      from songs
      group by song_id
) mx
inner join songs s
on s.song_id = mx.song_id
and s.rating = mx.max_rating;

>>> For that example I presupposed a table with columns like
>>>
>>> SONG_ID   SONG_KEY   RATING
>>>
>>> where the SONG_ID (for my own sanity) is simply a primary Key "id",
>>> SONG_KEY is like your "SONG_ID", and RATING is "RATING".
>>>
>>> On Oracle you could also do something like
>>>
>>> select distinct song_key, max(rating) over (partition by song_key)
>>> max_rating from songs;
>>>
>>> which makes use of Oracle analytic functions.
>>
>> This should also work on PostgreSQL.

> Good to know. I'm used to the Oracle ones, but I'm certainly no DBA in
> general. Now that you mentioned this, I researched things some. It looks
> like not only Oracle but also DB2 have had a rich set of analytic
> functions for quite a long time (I think DB2 calls them OLAP functions),
> SQL Server has been incorporating them since SQL Server 2005 (SQL Server
> 2008 presumably better; I think SQL Server calls these ranking
> functions), and PostgreSQL has them now also (generally speaking ANSI
> SQL calls these things window functions; PostgreSQL 8.4 introduced
> support and PostgreSQL 9.x is presumably better at it).

Actually SQL Server's SQL engine isn't too bad.  With CTE (common table 
expression, Oracle has it, too, but calls it "subquery factoring clause" 
- and it has slightly different functionality IIRC [1]) you could do 
recursive queries (since 2005 IIRC) that in Oracle you needed non 
standard CONNECT BY syntax for.

> I use PostgreSQL regularly on my MacBook, but I've been stuck on 8.3 for
> a few years.
>
> MySQL I know very little about. The few articles I located on MySQL and
> analytic functions gave me the impression that the support is poor or
> missing, but perhaps I am wrong.

Same here but I wonder what the optimizer can do with arbitrary storage 
engines.  But this is a completely different topic...

Cheers

	robert


[1] 
http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_10002.htm#i2161315

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

SQL Puzzle - too many dimensions Zapanaz <http://joecosby.com/code/mail.pl@foo.com> - 2011-05-07 16:13 -0700
  Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-08 14:00 +1200
    Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-08 13:08 +0200
      Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-09 00:01 +1200
        Re: SQL Puzzle - too many dimensions Lew <noone@lewscanon.com> - 2011-05-08 08:08 -0400
  Re: SQL Puzzle - too many dimensions Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-05-08 01:02 -0300
    Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-08 13:03 +0200
      Re: SQL Puzzle - too many dimensions Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-05-08 11:14 -0300
        Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-08 23:37 +0200
    Re: SQL Puzzle - too many dimensions Zapanaz <http://joecosby.com/code/mail.pl@foo.com> - 2011-05-08 16:06 -0700
      Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-09 22:10 +0200
        Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-10 13:50 +1200
          Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-10 00:06 -0700
            Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-13 22:55 +1200
              Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 18:52 +0200
          Re: SQL Puzzle - too many dimensions Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-05-10 06:35 -0300
            Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-13 22:57 +1200
              Re: SQL Puzzle - too many dimensions Lew <noone@lewscanon.com> - 2011-05-13 07:45 -0400
              Re: SQL Puzzle - too many dimensions Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-05-15 13:09 -0300
                Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 18:56 +0200
  Re: SQL Puzzle - too many dimensions Roedy Green <see_website@mindprod.com.invalid> - 2011-05-07 23:32 -0700
    Re: SQL Puzzle - too many dimensions Roedy Green <see_website@mindprod.com.invalid> - 2011-05-07 23:36 -0700
      Re: SQL Puzzle - too many dimensions Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-05-08 20:43 +1200
        Re: SQL Puzzle - too many dimensions Robert Klemme <shortcutter@googlemail.com> - 2011-05-09 01:27 -0700
  Re: SQL Puzzle - too many dimensions Lew <noone@lewscanon.com> - 2011-05-08 07:45 -0400

csiph-web