Sparse sets

From	mhx@iae.nl (Marcel Hendrix)
Subject	Sparse sets
Newsgroups	comp.lang.forth
Message-ID	<90861509008435@frunobulax.edu> (permalink)
Date	2012-03-26 20:49 +0200
Organization	Wanadoo

Show all headers | View raw

I have a hard time believing the quoted advantages of sparse sets myself, 
but it is a nice exercise where one can drop in a small Forth twist. 

I like the code a lot better than what can be found for other languages.

-marcel

-- ------------------------------------------------------------------
(*
 * LANGUAGE    : ANS Forth with extensions
 * PROJECT     : Forth Environments
 * DESCRIPTION : Sparse Set operations 
 * CATEGORY    : Exercise 
 * AUTHOR      : Marcel Hendrix 
 * LAST CHANGE : March 25, 2012, Marcel Hendrix 
 *)



	NEEDS -miscutil

	REVISION -sparseset "--- Sparse Sets         Version 0.01 ---"

	PRIVATES

DOC
(*
 ( From http://programmingpraxis.com/ )

 A clever data structure exists for storing sparse sets of integers 
 on the range 0 .. u-1. Performing initialization, lookup, and insertion
 is O(1) while iteration is O(n), where n is the number of elements in 
 the set. 
 
 This data structure was studied in the 1993 article "An Efficient 
 Representation for Sparse Sets" by Preston Briggs and Linda Torczon, 
 in Exercise 1.9 (1.8 in the first edition) of Jon Bentley's book 
 "Programming Pearls," and in exercise 2.12 of the 1974 book "The Design
 and Analysis of Computer Algorithms" by Al Aho, John Hopcroft 
 and Jeffrey Ullman. 
 
 The data structure considers a universe of integers from 0 to u-1; 
 depending on the circumstances the integers probably map to 
 something else, but we don't care about that. Any given set consists 
 of n items chosen from the universe; there are no duplicates. Note 
 that n < u, certainly, and likely n is much less than u, otherwise, 
 you would probably use a bit vector to represent the set. Note also 
 that we are optimizing for speed at the expense of space, as a bit 
 vector takes u bits but our data structure takes 2u integers.
 
 Think about a bit vector. Setting a bit is a constant-time operation, 
 as is checking if a bit is set or unset. But initializing the bit 
 vector and iterating over its set elements each take time 
 proportional to the vector size. Our sparse set reduce the 
 iteration to time proportional to the size of the set (rather than
 the size of the universe) and reduce the initialization time 
 to a constant.
 
	  n
	  |
          v
 D: 5 1 4 _ _ _ _ _ _ _  
     \| |
      \ \
      |\ \
      | \ \
      |  \ \
      |   \ \
      |    \|
      |     \
      |     |\
      |     | |
 S: _ 1 _ _ 2 0 _ _ _ _ 
 ix 0 1 2 3 4 5 6 7 8 9 
 
 The sparse set is represented by two vectors that we will call dense 
 (abbreviated D) and sparse (abbreviated S). Initially n, the number of
 elements in the set, is zero; the two vectors are uninitialized and 
 may contain anything. To add an element 0 = k < u to a set that does 
 not already contain k, we set D[n] to k, S[k] to n, and increase n 
 by 1, an operation that takes constant time. After this, the two 
 vectors point to each other, which gives a test of set membership 
 that also works in constant time: an element k is in the set if and
 only if S[k] < n and D[S[k]] == k. Note that if k is not a member of
 the set, the value of S[k] doesn't matter; either S[k] will be greater 
 than n or it will point to an element of D that doesn't point back 
 to it. 
 
 The diagram above shows a set with the elements 5, 1 and 4; the '_' 
 boxes may contain any value. To iterate over the elements of the set, 
 read D[0 .. n-1], which takes time O(n), and to clear the set make 
 n = 0, which takes time O(1); note in particular that clearing the
 set doesn't require reinitialization. Other operations, including 
 size-of, delete, union, intersection, difference, and set-equality
 are possible, and equally time-efficient compared to bit vectors, 
 but we won't discuss them here, since they are seldom used with 
 this representation of sets. A common use of these sparse sets is 
 with register allocation algorithms in compilers, which have a 
 fixed universe (the number of registers in the machine) and are
 updated and cleared frequently during a single processing run.
 
 Comments:
  1) The INSERTed elements are visited in order of storage, something 
     impossible with a  bit vector.
  2) The elements to be stored must be unique. (Use LOOKUP before INSERT)
*)
ENDDOC

0 VALUE #universe PRIVATE
0 VALUE n         PRIVATE
0 VALUE Dense     PRIVATE
0 VALUE Sparse    PRIVATE

: SPARSE-SET ( sz | "name" -- )
	CREATE 	DUP , 2* CELLS ALLOT 
	DOES>  	@+ TO #universe TO Dense  
		Dense #universe CELLS + TO Sparse ;

	#16 SPARSE-SET test-set	test-set ( make it current )

-- Assume x is not already in the set.
: INSERT ( x -- )
	DUP #universe >= ABORT" INSERT :: out of range"
	DUP Dense n CELL[] !  
	Sparse []CELL  n SWAP !  
	1 +TO n ;  

: LOOKUP ( x -- bool )
	Sparse OVER CELL[] @ DUP n U>= IF  2DROP FALSE EXIT  ENDIF
	Dense []CELL @ = ; 

: ITERATE ( xt -- ) LOCAL xt  n 0 ?DO  Dense I CELL[] @ xt EXECUTE  LOOP ;
: CLEARS  ( -- )    CLEAR n ;

CLEARS
CR .( Insert {5,1,4} ) 5 INSERT  1 INSERT  4 INSERT
CR .( Lookup {5,1,4,15} => ) 5 LOOKUP . 1 LOOKUP . 4 LOOKUP . #15 LOOKUP .
CR .( Iterate with '.'  => ) ' . ITERATE

:ABOUT	CR ." Sparse Sets" 
     	CR ." Try: ( n -- )      INSERT" 
	CR ."      ( n -- bool ) LOOKUP " 
	CR ."      ( xt -- )     ITERATE " 
	CR ."      ( -- )        CLEARS " ;


		.ABOUT -sparseset CR
		DEPRIVE

                              (* End of Source *)

Back to comp.lang.forth | Previous | Next — Next in thread | Find similar

Thread

Sparse sets mhx@iae.nl (Marcel Hendrix) - 2012-03-26 20:49 +0200
  Re: Sparse sets anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-03-27 14:43 +0000
  Re: Sparse sets David Kuehling <dvdkhlng@gmx.de> - 2012-03-28 23:41 +0200

csiph-web