Groups > comp.databases.ms-sqlserver > #477

Re: Generating a tree structure from data

From	--CELKO-- <jcelko212@earthlink.net>
Newsgroups	comp.databases.ms-sqlserver
Subject	Re: Generating a tree structure from data
Date	2011-06-23 07:36 -0700
Organization	http://groups.google.com
Message-ID	<91550d41-50c9-4f46-b0df-929e87960cae@e17g2000prj.googlegroups.com> (permalink)
References	<b8b37caa-5853-4a76-adc0-c5f0cc28a16a@a10g2000vbz.googlegroups.com>

Show all headers | View raw

>> I've been banging my head against a brick wall trying to sort this and hoping somebody can help!!! <<

People cannot read your mind, so post your code and clear specs if you
really want help. Please post real DDL and not narrative or your own
personal programming language. Learn to use ISO-11179 rules for the
data element names, avoid needless dialect and use ISO-8601 temporal
formats,  codes and so forth. Please tell us what SQL product and
release you are using. Tell us if you can change the DDL or if you are
stuck with it.

The vague narrative you did post is completely wrong. Columns are
nothing like fields. A data elemetn can be a “<something>_code” or a
“<something>_id” but never an absurd hybrid “id_code”. Do you also say
“blood_type” or “blood_type_code_value_id”?

The terms “parent”: and “link” belong to the pre-relational network
databases. They do not exist in RDBMS or SQL. We have referenced and
referencing columns.  My guess at your DDL is that you have a
incorrectly done adjacency list model. The nodes and they tree
st5rucutre are crammed itno one table in violation for the basic data
modeling rule never to mix relationship and data in a single table.

>> parent is either blank, Y (meaning there are child links [sic]) and H (meaning it's the top level of a structure) <<

First of all, we use a NULL, not a blank for missing data in SQL.
Blank fields were part of 1950's COBOL.

Secondly, those silly flags are carrying structural information at
different levels of aggregation.

>> i_code is effectively the parent id and id_code is the child id <<

Same data element with two names?  Again, another fundamental error.
The VIN on your automobile does not change from your insurance policy,
inspection sticker, or DMV records. Data elements have one and only
one name, but they might have a role; “child_node_id” and
“parent_node_id” would be valid.

>> id_desc is the description of the link [sic] <<

How does an identifier have a description? Entities have
descriptions.


>> I want to be able to create a procedure that recursively loops through the table (intradtl) and outputs an indented structure of the file <<

Procedural code instead of declarative code. Yep, back in the 1950's
again.

Another way of representing trees is to show them as nested sets.
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us
define a simple OrgChart table like this.

   CREATE TABLE OrgChart
  (emp_name CHAR(10) NOT NULL PRIMARY KEY,
  lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
  rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
  CONSTRAINT order_okay CHECK (lft < rgt));


  OrgChart
  emp_name lft rgt
  ======================
  'Albert' 1 12
  'Bert' 2 3
  'Chuck' 4 11
  'Donna' 5 6
  'Eddie' 7 8
  'Fred' 9 10

  The (lft, rgt) pairs are like tags in a mark-up language, or parens
in algebra, BEGIN-END blocks in Algol-family programming languages,
etc. -- they bracket a sub-set. This is a set-oriented approach to
trees in a set-oriented language.

 The organizational chart would look like this as a directed graph:

  Albert (1, 12)
  / \
  /  \
  Bert (2, 3)  Chuck (4, 11)
  / |  \
 /  | \
 /  | \
  / |  \

  Donna (5, 6) Eddie (7, 8) Fred (9, 10)

  The adjacency list table is denormalized in several ways. We are
modeling both the Personnel and the Organizational chart in one table.
But for the sake of saving space, pretend that the names are job
titles and that we have another table which describes the Personnel
that hold those positions.

  Another problem with the adjacency list model is that the
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized table. To prove that this is not normalized,
assume that "Chuck" changes his name to "Charles"; you have to change
his name in both columns and several places. The defining
characteristic of a normalized table is that you have one fact, one
place, one time.

  The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.


  To show a tree as nested sets, replace the nodes with ovals, and
then nest subordinate ovals inside each other. The root will be the
largest oval and will contain every other node. The leaf nodes will be
the innermost ovals with nothing else inside them and the nesting will
show the hierarchical relationship. The (lft, rgt) columns (I cannot
use the reserved words LEFT and RIGHT in SQL) are what show the
nesting. This is like XML, HTML or parentheses.

  At this point, the boss_emp_name column is both redundant and
denormalized, so it can be dropped. Also, note that the tree structure
can be kept in one table and all the information about a node can be
put in a second table and they can be joined on employee number for
queries.

  To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a node, he puts a
number in the cell on the side that he is visiting and increments his
counter. Each node will get two numbers, one of the right side and one
for the left. Computer Science majors will recognize this as a
modified preorder tree traversal algorithm. Finally, drop the unneeded
OrgChart.boss_emp_name column which used to represent the edges of a
graph.

  This has some predictable results that we can use for building
queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
are defined by the BETWEEN predicate; etc. Here are two common queries
which can be used to build others:

 1. An employee and all their Supervisors, no matter how deep the
tree.

  SELECT O2.*
  FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
  AND O1.emp_name = :myemployee;

   2. The employee and all their subordinates. There is a nice
symmetry here.

  SELECT O1.*
  FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
  AND O2.emp_name = :myemployee;

  3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

   SELECT O2.emp_name, SUM(S1.salary_amt)
  FROM OrgChart AS O1, OrgChart AS O2,
  Salaries AS S1
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt
  AND O1.emp_name = S1.emp_name
  GROUP BY O2.emp_name;

   4. To find the level of each emp_name, so you can print the tree as
an indented listing.

  SELECT T1.node,
  SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
  + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
  FROM Tree AS T1, Tree AS T2
  WHERE T2.lft <= T1.lft
  GROUP BY T1.node;

  An untested version of this using OLAP functions might be better
able to use the ordering.

  SELECT T1.node,
  SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
  + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
  OVER (ORDER BY T1.lft
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
  FROM Tree AS T1, Tree AS T2
  WHERE T2.lft <= T1.lft;

  5. The nested set model has an implied ordering of siblings which
the adjacency list model does not. To insert a new node, G1, under
part G. We can insert one node at a time like this:

   BEGIN ATOMIC
  DECLARE rightmost_spread INTEGER;
  SET rightmost_spread
  = (SELECT rgt
  FROM Frammis
  WHERE part = 'G');
  UPDATE Frammis
  SET lft = CASE WHEN lft > rightmost_spread
  THEN lft + 2
  ELSE lft END,
  rgt = CASE WHEN rgt >= rightmost_spread
  THEN rgt + 2
  ELSE rgt END
  WHERE rgt >= rightmost_spread;

  INSERT INTO Frammis (part, lft, rgt)
  VALUES ('G1', rightmost_spread, (rightmost_spread + 1));

  COMMIT WORK;
  END;

  The idea is to spread the (lft, rgt) numbers after the youngest
child of the parent, G in this case, over by two to make room for the
new addition, G1. This procedure will add the new node to the
rightmost child position, which helps to preserve the idea of an age
order among the siblings.

  6. To convert a nested sets model into an adjacency list model:

  SELECT B.emp_name AS boss_emp_name, E.emp_name
  FROM OrgChart AS E
  LEFT OUTER JOIN
  OrgChart AS B
  ON B.lft
  = (SELECT MAX(lft)
  FROM OrgChart AS S
  WHERE E.lft > S.lft
  AND E.lft < S.rgt);

  7. To find the immediate parent of a node:

  SELECT MAX(P2.lft), MIN(P2.rgt)
  FROM Personnel AS P1, Personnel AS P2
  WHERE P1.lft BETWEEN P2.lft AND P2.rgt
  AND P1.emp_name = :my_emp_name;

  I have a book on TREES & HIERARCHIES IN SQL which you can get at
Amazon.com right now.

Back to comp.databases.ms-sqlserver | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Generating a tree structure from data Steve <steve.simpson@aircelle.com> - 2011-06-22 02:40 -0700
  Re: Generating a tree structure from data Erland Sommarskog <esquel@sommarskog.se> - 2011-06-22 22:18 +0200
  Re: Generating a tree structure from data --CELKO-- <jcelko212@earthlink.net> - 2011-06-23 07:36 -0700
  Re: Generating a tree structure from data rembrandt <batesbrandt@gmail.com> - 2011-06-29 04:47 -0700

csiph-web