Groups > comp.databases.postgresql > #861 > unrolled thread

Table with a variable number of elements in a column

Started by	Bill Gunshannon <bill.gunshannon@gmail.com>
First post	2019-04-27 14:59 -0400
Last post	2020-02-22 09:15 +0100
Articles	10 — 4 participants

Back to article view | Back to comp.databases.postgresql

  Table with a variable number of elements in a column Bill Gunshannon <bill.gunshannon@gmail.com> - 2019-04-27 14:59 -0400
    Re: Table with a variable number of elements in a column John-Paul Stewart <jpstewart@sympatico.ca> - 2019-04-27 15:39 -0400
      Re: Table with a variable number of elements in a column Bill Gunshannon <bill.gunshannon@gmail.com> - 2019-04-27 15:48 -0400
        Re: Table with a variable number of elements in a column John-Paul Stewart <jpstewart@sympatico.ca> - 2019-04-27 16:02 -0400
          Re: Table with a variable number of elements in a column Bill Gunshannon <bill.gunshannon@gmail.com> - 2019-04-27 16:29 -0400
            Re: Table with a variable number of elements in a column John-Paul Stewart <jpstewart@sympatico.ca> - 2019-04-27 17:10 -0400
              Re: Table with a variable number of elements in a column Bill Gunshannon <bill.gunshannon@gmail.com> - 2019-04-27 18:40 -0400
                Re: Table with a variable number of elements in a column John-Paul Stewart <jpstewart@sympatico.ca> - 2019-04-27 19:26 -0400
    Re: Table with a variable number of elements in a column alexander.d.mills@gmail.com - 2020-02-21 14:02 -0800
      Re: Table with a variable number of elements in a column Christian Barthel <bch@online.de> - 2020-02-22 09:15 +0100

#861 — Table with a variable number of elements in a column

From	Bill Gunshannon <bill.gunshannon@gmail.com>
Date	2019-04-27 14:59 -0400
Subject	Table with a variable number of elements in a column
Message-ID	<gijn3vF68nnU1@mid.individual.net>


Not sure what the right terminology for this is, but I will
provide my example and see if anyone can tell me how I might
do this.

I want to create a database table for an index of all my record
albums.

The basic stuff is easy.

Title, Artist, Publisher and the Publisher's ID # as a primary
unique key.
But then I get to the hard part.
Number of tracks and then a list of those tracks.
This would, obviously, be different and variable from album to album.

So, I need a way to define a table that has a variable number of fields
depending on the value in Number-of-Tracks.

Is there any way in standard SQL to do this without just having a
large number of fields and only using the number you need?


bill

[toc] | [next] | [standalone]

#862

From	John-Paul Stewart <jpstewart@sympatico.ca>
Date	2019-04-27 15:39 -0400
Message-ID	<gijpelF6n73U1@mid.individual.net>
In reply to	#861

On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:
> 
> 
> Not sure what the right terminology for this is, but I will
> provide my example and see if anyone can tell me how I might
> do this.
> 
> I want to create a database table for an index of all my record
> albums.
> 
> The basic stuff is easy.
> 
> Title, Artist, Publisher and the Publisher's ID # as a primary
> unique key.
> But then I get to the hard part.
> Number of tracks and then a list of those tracks.
> This would, obviously, be different and variable from album to album.
> 
> So, I need a way to define a table that has a variable number of fields
> depending on the value in Number-of-Tracks.

The usual way to do this in a relational database is to have a separate
table of tracks and define the relationship between the two tables.  For
example, add a unique album id number to your albums table.  (You won't
need the "number of tracks" field you propose.)  Then have a tracks
table that has album id (which refers back to the albums table), track
number, track title, etc.  You can then use a JOIN clause in your SQL
SELECT statement to associate the track and album info with each other,
or other queries to get the number of tracks in an album (e.g., "select
count(*) from tracks where album_id = 1"), or whatever else you need to
know about it.

You probably want to read up on the concept of "foreign keys" and SQL
JOIN clauses.  That's the usual way to do it in a database, and a big
part of the relational model.

[toc] | [prev] | [next] | [standalone]

#863

From	Bill Gunshannon <bill.gunshannon@gmail.com>
Date	2019-04-27 15:48 -0400
Message-ID	<gijpvpF6sfbU1@mid.individual.net>
In reply to	#862

On 4/27/19 3:39 PM, John-Paul Stewart wrote:
> On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:
>>
>>
>> Not sure what the right terminology for this is, but I will
>> provide my example and see if anyone can tell me how I might
>> do this.
>>
>> I want to create a database table for an index of all my record
>> albums.
>>
>> The basic stuff is easy.
>>
>> Title, Artist, Publisher and the Publisher's ID # as a primary
>> unique key.
>> But then I get to the hard part.
>> Number of tracks and then a list of those tracks.
>> This would, obviously, be different and variable from album to album.
>>
>> So, I need a way to define a table that has a variable number of fields
>> depending on the value in Number-of-Tracks.
> 
> The usual way to do this in a relational database is to have a separate
> table of tracks and define the relationship between the two tables.  For
> example, add a unique album id number to your albums table.  (You won't
> need the "number of tracks" field you propose.) 

Well, I am going to want that number anyway. :-)

>                                                   Then have a tracks
> table that has album id (which refers back to the albums table), track
> number, track title, etc.  You can then use a JOIN clause in your SQL
> SELECT statement to associate the track and album info with each other,
> or other queries to get the number of tracks in an album (e.g., "select
> count(*) from tracks where album_id = 1"), or whatever else you need to
> know about it.

But if I understand this correctly I would need a separate table for
every album resulting in, potentially, thousands of tables. (OK, in
my case hundreds but others may like this idea, too, when I finish
the whole project.)

> 
> You probably want to read up on the concept of "foreign keys" and SQL
> JOIN clauses.  That's the usual way to do it in a database, and a big
> part of the relational model.
Not sure what the "foreign keys" have to do with it, but I understand
the JOIN part.  I was just looking for an efficient way to do this and
somehow the though of hundreds to thousands of TABLES scares me.

But, thanks for the info.  I will likely end out trying it a couple
of different ways before deciding which is best.

bill

[toc] | [prev] | [next] | [standalone]

#864

From	John-Paul Stewart <jpstewart@sympatico.ca>
Date	2019-04-27 16:02 -0400
Message-ID	<gijqqoF71obU1@mid.individual.net>
In reply to	#863

On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
> On 4/27/19 3:39 PM, John-Paul Stewart wrote:
>> On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:
>>>
>>>
>>> Not sure what the right terminology for this is, but I will
>>> provide my example and see if anyone can tell me how I might
>>> do this.
>>>
>>> I want to create a database table for an index of all my record
>>> albums.
>>>
>>> The basic stuff is easy.
>>>
>>> Title, Artist, Publisher and the Publisher's ID # as a primary
>>> unique key.
>>> But then I get to the hard part.
>>> Number of tracks and then a list of those tracks.
>>> This would, obviously, be different and variable from album to album.
>>>
>>> So, I need a way to define a table that has a variable number of fields
>>> depending on the value in Number-of-Tracks.
>>
>> The usual way to do this in a relational database is to have a separate
>> table of tracks and define the relationship between the two tables.  For
>> example, add a unique album id number to your albums table.  (You won't
>> need the "number of tracks" field you propose.) 
> 
> Well, I am going to want that number anyway. :-)

Yes, but the database can calculate it for you, for each album.  Having
a separate column could lead to problems where the column says something
different than the actual number of tracks stored in the table.

>>                                                   Then have a tracks
>> table that has album id (which refers back to the albums table), track
>> number, track title, etc.  You can then use a JOIN clause in your SQL
>> SELECT statement to associate the track and album info with each other,
>> or other queries to get the number of tracks in an album (e.g., "select
>> count(*) from tracks where album_id = 1"), or whatever else you need to
>> know about it.
> 
> But if I understand this correctly I would need a separate table for
> every album resulting in, potentially, thousands of tables. (OK, in
> my case hundreds but others may like this idea, too, when I finish
> the whole project.)

No, it's only one "tracks" table.  That's why there's an album id column
in it:  so that each track is associated with one album.  There'd be
(album 1, track 1), (album 1, track 2), (album 2, track 1), and so
on....  Each (album, track) tuple would have to be unique.  But you'd
keep all the tracks for all the albums in one table.

>>
>> You probably want to read up on the concept of "foreign keys" and SQL
>> JOIN clauses.  That's the usual way to do it in a database, and a big
>> part of the relational model.
> Not sure what the "foreign keys" have to do with it, 

The foreign key (in this case, the use of album id in the tracks table)
is what connects the individual tracks to the albums, and ensures that
the album id actually exists.  By declaring album id (in the tracks
table) as a foreign key referring to the albums table, Postgres will
ensure that each track has a corresponding album.  Without declaring it
as a foreign key, you can mistakenly have a track who's album id doesn't
exist.

> but I understand
> the JOIN part.  I was just looking for an efficient way to do this and
> somehow the though of hundreds to thousands of TABLES scares me.

It's just two:  albums and tracks.

> But, thanks for the info.  I will likely end out trying it a couple
> of different ways before deciding which is best.

Another topic to read up on is "database normalization".

[toc] | [prev] | [next] | [standalone]

#865

From	Bill Gunshannon <bill.gunshannon@gmail.com>
Date	2019-04-27 16:29 -0400
Message-ID	<gijsdrF7d5iU1@mid.individual.net>
In reply to	#864

On 4/27/19 4:02 PM, John-Paul Stewart wrote:
> On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
>> On 4/27/19 3:39 PM, John-Paul Stewart wrote:
>>> On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:
>>>>
>>>>
>>>> Not sure what the right terminology for this is, but I will
>>>> provide my example and see if anyone can tell me how I might
>>>> do this.
>>>>
>>>> I want to create a database table for an index of all my record
>>>> albums.
>>>>
>>>> The basic stuff is easy.
>>>>
>>>> Title, Artist, Publisher and the Publisher's ID # as a primary
>>>> unique key.
>>>> But then I get to the hard part.
>>>> Number of tracks and then a list of those tracks.
>>>> This would, obviously, be different and variable from album to album.
>>>>
>>>> So, I need a way to define a table that has a variable number of fields
>>>> depending on the value in Number-of-Tracks.
>>>
>>> The usual way to do this in a relational database is to have a separate
>>> table of tracks and define the relationship between the two tables.  For
>>> example, add a unique album id number to your albums table.  (You won't
>>> need the "number of tracks" field you propose.)
>>
>> Well, I am going to want that number anyway. :-)
> 
> Yes, but the database can calculate it for you, for each album.  Having
> a separate column could lead to problems where the column says something
> different than the actual number of tracks stored in the table.

Ah yes, I guess I was too tied into the database using the number
to determine how many fields it needed.

> 
>>>                                                    Then have a tracks
>>> table that has album id (which refers back to the albums table), track
>>> number, track title, etc.  You can then use a JOIN clause in your SQL
>>> SELECT statement to associate the track and album info with each other,
>>> or other queries to get the number of tracks in an album (e.g., "select
>>> count(*) from tracks where album_id = 1"), or whatever else you need to
>>> know about it.
>>
>> But if I understand this correctly I would need a separate table for
>> every album resulting in, potentially, thousands of tables. (OK, in
>> my case hundreds but others may like this idea, too, when I finish
>> the whole project.)
> 
> No, it's only one "tracks" table.  That's why there's an album id column
> in it:  so that each track is associated with one album.  There'd be
> (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
> on....  Each (album, track) tuple would have to be unique.  But you'd
> keep all the tracks for all the albums in one table.

I got that now.
But I guess I am still stuck with how I define how many tracks columns
there has to be for each album.  I think I have been away from this for
too long.  I am missing something that is probably both simple and very
apparent.

> 
>>>
>>> You probably want to read up on the concept of "foreign keys" and SQL
>>> JOIN clauses.  That's the usual way to do it in a database, and a big
>>> part of the relational model.
>> Not sure what the "foreign keys" have to do with it,
> 
> The foreign key (in this case, the use of album id in the tracks table)
> is what connects the individual tracks to the albums, and ensures that
> the album id actually exists.  By declaring album id (in the tracks
> table) as a foreign key referring to the albums table, Postgres will
> ensure that each track has a corresponding album.  Without declaring it
> as a foreign key, you can mistakenly have a track who's album id doesn't
> exist.

Got it, I understand that part now.  Like I said, been away from this
for much too long.  (Damn retirement.....)

> 
>> but I understand
>> the JOIN part.  I was just looking for an efficient way to do this and
>> somehow the though of hundreds to thousands of TABLES scares me.
> 
> It's just two:  albums and tracks.
> 
>> But, thanks for the info.  I will likely end out trying it a couple
>> of different ways before deciding which is best.
> 
> Another topic to read up on is "database normalization".
> 

I will.  Amazing how long it takes to learn and how fast you can
lose the learning when you stop doing it.

bill

[toc] | [prev] | [next] | [standalone]

#866

From	John-Paul Stewart <jpstewart@sympatico.ca>
Date	2019-04-27 17:10 -0400
Message-ID	<gijuppF7slsU1@mid.individual.net>
In reply to	#865

On 2019-04-27 4:29 p.m., Bill Gunshannon wrote:
> On 4/27/19 4:02 PM, John-Paul Stewart wrote:
>> On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
>>>
>>> But if I understand this correctly I would need a separate table for
>>> every album resulting in, potentially, thousands of tables. (OK, in
>>> my case hundreds but others may like this idea, too, when I finish
>>> the whole project.)
>>
>> No, it's only one "tracks" table.  That's why there's an album id column
>> in it:  so that each track is associated with one album.  There'd be
>> (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
>> on....  Each (album, track) tuple would have to be unique.  But you'd
>> keep all the tracks for all the albums in one table.
> 
> I got that now.
> But I guess I am still stuck with how I define how many tracks columns
> there has to be for each album.  I think I have been away from this for
> too long.  I am missing something that is probably both simple and very
> apparent.

By using a tracks table, it's no longer a question of "how many tracks
columns" but "how many tracks rows" instead. And the answer is "however
many you populate with the same album id".  That's the beauty of it.

[toc] | [prev] | [next] | [standalone]

#867

From	Bill Gunshannon <bill.gunshannon@gmail.com>
Date	2019-04-27 18:40 -0400
Message-ID	<gik42aF8vneU1@mid.individual.net>
In reply to	#866

On 4/27/19 5:10 PM, John-Paul Stewart wrote:
> On 2019-04-27 4:29 p.m., Bill Gunshannon wrote:
>> On 4/27/19 4:02 PM, John-Paul Stewart wrote:
>>> On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
>>>>
>>>> But if I understand this correctly I would need a separate table for
>>>> every album resulting in, potentially, thousands of tables. (OK, in
>>>> my case hundreds but others may like this idea, too, when I finish
>>>> the whole project.)
>>>
>>> No, it's only one "tracks" table.  That's why there's an album id column
>>> in it:  so that each track is associated with one album.  There'd be
>>> (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
>>> on....  Each (album, track) tuple would have to be unique.  But you'd
>>> keep all the tracks for all the albums in one table.
>>
>> I got that now.
>> But I guess I am still stuck with how I define how many tracks columns
>> there has to be for each album.  I think I have been away from this for
>> too long.  I am missing something that is probably both simple and very
>> apparent.
> 
> By using a tracks table, it's no longer a question of "how many tracks
> columns" but "how many tracks rows" instead. And the answer is "however
> many you populate with the same album id".  That's the beauty of it.
> 

Yeah, I went upstairs to watch the golf and think about it and
it hit me.  I was looking at it all wrong.  I was trying to label
the field for each track but being as they are merely sequentially
numbered values they don't need individual names.  Duh...

Hopefully other stuff will come back to the surface easier than this.

Thank you very much for kicking me in the head to knock this stuff
loose so it could float back up to the surface.

bill

[toc] | [prev] | [next] | [standalone]

#868

From	John-Paul Stewart <jpstewart@sympatico.ca>
Date	2019-04-27 19:26 -0400
Message-ID	<gik6ppF9ho2U1@mid.individual.net>
In reply to	#867

On 2019-04-27 6:40 p.m., Bill Gunshannon wrote:
> 
> Yeah, I went upstairs to watch the golf and think about it and
> it hit me.  I was looking at it all wrong.  

I know the feeling.  It's all too easy to get fixated on one idea and
then "not see the forest for the trees".  I've been there, done that far
more times than I'd care to admit.

You really don't want to know how badly I mis-used Postgres' arrays in
the past to achieve the result you initially described before I learned
about this stuff.

> Thank you very much for kicking me in the head to knock this stuff
> loose so it could float back up to the surface.

I'm sure there are plenty of topics where I'll need the kick in the
head.  And more where no amount of kicking will help me.  That's what I
like about these newsgroups:  they're a great place to get a second opinion.

Good luck!  Your album archive project sounds like more fun than
anything I'm working on.

[toc] | [prev] | [next] | [standalone]

#888

From	alexander.d.mills@gmail.com
Date	2020-02-21 14:02 -0800
Message-ID	<c9bc3eb5-42c2-491b-9570-cedba269bd41@googlegroups.com>
In reply to	#861

I have a similar question/answer on StackExchange: 
https://softwareengineering.stackexchange.com/questions/405567/how-to-use-strict-schema-with-seemingly-fluid-data-type

[toc] | [prev] | [next] | [standalone]

#889

From	Christian Barthel <bch@online.de>
Date	2020-02-22 09:15 +0100
Message-ID	<87eeunavho.fsf@barthel.ch>
In reply to	#888

alexander.d.mills@gmail.com writes:

> I have a similar question/answer on StackExchange: 
> https://softwareengineering.stackexchange.com/questions/405567/
> how-to-use-strict-schema-with-seemingly-fluid-data-type

I have seen that kind of modelling elsewhere.  This is basically
an EAV (Entity Attribute Value) table?  The book [1,2] describes
it as an anti-pattern which should be avoided because it is
harder to use, control and query later.

Example: The email should probably satisfy some constraints but
it is of type "json" and anything can be stored in it.  How about
typos in the "key" field etc.  Note that processing may consume
more CPU time as well according to Wikipedia [3] - depending on
the planned size of your database.

So, before using this pattern, I would think about whether it is
really necessary to have that flexibility (at the cost of the
disadvantages) or if it is possible to use normalization theory
and create relations for the entities.

| is there a better way to do this other than using JSON for the
| value column?

With the example shown, why are you using a "JSON" field instead
of a text field?  If you are using a JSON field, I would store
all emails in one JSON array field and make (user_id, key)
unique:

| id | user_id | key   | value                                |
|----|---------|-------|--------------------------------------|
| 1  | 1       | email | ["foo1@bar.com", "foo2@bar.com", ..] |
...

Why are there more than one email addresses for each user?  Do
they serve a specific purpose?  With your design, it seems
impossible to select some email addresses for certain actions
(which may be OK?).

An alternative version might be:  use a N:M mapping between
"user" and "email" (the "email" table contains a list of email
addresses a user may specify, i.e. "email at work", "private
mail", etc. and link the user with zero or more email
addresses).

(Further N:M mappings might be necessary for other "key"
values which might lead to a larger number of tables but column
constraints and queries would be simpler)

| If not - is there a way to enforce a schema on the JSON
| somehow?

You can do this with a trigger that runs before the insertion.
You have to carefully check the layout of the JSON and the
constraint of its values (i.e. if the key=email, then the value
field should have a certain pattern etc.).  I would also check
the "key" field and only allow existing keys (so that it becomes
possible to add "e-mail", "E-Mail", ... ).  PostgreSQL comes with
nice functions and operators to work with JSON [4].

| Last question - from my brief research the inverse table design
| is called an "unpivot" table - but if there is a  better name
| for it please let me know. 

"unpivot" sounds a bit "unspecific".  Personally, I'd name it
"user_meta_data" or "user_contacts" (if there are only email
addresses).

[1] D. Fontaine: Mastering PostgreSQL In Application Development
[2] https://tapoueh.org/blog/2018/03/database-modelization-anti-patterns/#entity-attribute-values
[3] https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
[4] https://www.postgresql.org/docs/11/functions-json.html
-- 
Christian Barthel <bch@online.de>

[toc] | [prev] | [standalone]

csiph-web

Table with a variable number of elements in a column

Contents

#861 — Table with a variable number of elements in a column

#862

#863

#864

#865

#866

#867

#868

#888

#889