Groups > comp.databases.ms-sqlserver > #1942

cluster or hash table advise needed

X-Received	by 10.182.142.4 with SMTP id rs4mr1756347obb.2.1443553566087; Tue, 29 Sep 2015 12:06:06 -0700 (PDT)
X-Received	by 10.182.116.130 with SMTP id jw2mr126761obb.4.1443553566054; Tue, 29 Sep 2015 12:06:06 -0700 (PDT)
Path	csiph.com!xmission!news.glorb.com!kq10no13386243igb.0!news-out.google.com!n2ni15761igy.0!nntp.google.com!kq10no13386242igb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups	comp.databases.ms-sqlserver
Date	Tue, 29 Sep 2015 12:06:05 -0700 (PDT)
Complaints-To	groups-abuse@google.com
Injection-Info	glegroupsg2000goo.googlegroups.com; posting-host=71.95.163.250; posting-account=99cyNgoAAAA03l-zLDrnoY7TEbs-AvM9
NNTP-Posting-Host	71.95.163.250
User-Agent	G2/1.0
MIME-Version	1.0
Message-ID	<fa7449c7-daca-4a6d-81ba-603063464b4f@googlegroups.com> (permalink)
Subject	cluster or hash table advise needed
From	"M.G." <michael@gurfinkel.us>
Injection-Date	Tue, 29 Sep 2015 19:06:06 +0000
Content-Type	text/plain; charset=ISO-8859-1
Xref	csiph.com comp.databases.ms-sqlserver:1942

Show key headers only | View raw

We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

CREATE TABLE ACTION_SEQUENCE (
	/* ACTION_SEQUENCE_ID int NOT NULL, <<< questionable */
	EXP_ID int NOT NULL,
	ACT_SEQ int NOT NULL,
	ACT_ID int NOT NULL,
	MODIFIED_TIME datetime NULL,
	ACT_TYPE int NOT NULL,
);

EXP_ID and ACT_ID are foreign keys into experiments and actions tables correspondingly.

sample data for two experiments with 2 action for #100 and three actions for #200:
EXP_ID ACT_SEQ ACT_ID ACT_TYPE
100     1       233    0
100     2       560    0
100     3       233    1
200     1       220    0
200     2       220    1
200     3       778    0
200     4       778    1

The nature of EXP_ID - monotonous increment, same for ACT_ID.

How we read data - access one experiment at a time, its actions sorted by ACT_SEQ. Like this:
select * from ACTION_SEQUENCE where EXP_ID=@ID order by ACT_SEQ;
The sequence is of essence here.

How we add / update data - use delete/insert approach (never update), again records per one experiment are always deleted (if exists) and inserted as a group, like this:
delete from ACTION_SEQUENCE where EXP_ID=@ID;
insert into ACTION_SEQUENCE(EXP_ID,ACT_SEQ,ACT_ID) values (@ID,...)

Expected number of records - around 10 million, expected number of inserts (pure additions) around 5000 a day, number of change (delete/insert) around 1000

Expected read / write ratio - 10 reads per 1 update

The question - would you make it a clustered table. or no cluster at all?
Current design - table has ACTION_SEQUENCE_ID primary key (cluster), the sequence itself is maintained through external sequence table (reason - the app development framework). 
I don't see any need for cluster here (and it's maintenance).
We can use EXP_ID, ACT_SEQ, ACT_ID as unique key logically, but again, we never deal with individual records, only with groups of records for a given experiment.  

Your input is highly appreciated.

Back to comp.databases.ms-sqlserver | Previous | Next — Next in thread | Find similar

Thread

cluster or hash table advise needed "M.G." <michael@gurfinkel.us> - 2015-09-29 12:06 -0700
  Re: cluster or hash table advise needed Erland Sommarskog <esquel@sommarskog.se> - 2015-09-29 22:15 +0200
    Re: cluster or hash table advise needed "M.G." <michael@gurfinkel.us> - 2015-09-29 13:55 -0700

csiph-web