Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #86063 > unrolled thread

Algorithm for Creating Supersets of Smaller Sets Based on Common Elements

Started by"TommyVee" <xxxxxxxx@xxxxxx.xxx>
First post2015-02-21 14:46 -0500
Last post2015-02-22 17:02 +0000
Articles 10 — 9 participants

Back to article view | Back to comp.lang.python


Contents

  Algorithm for Creating Supersets of Smaller Sets Based on Common Elements "TommyVee" <xxxxxxxx@xxxxxx.xxx> - 2015-02-21 14:46 -0500
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Joel Goldstick <joel.goldstick@gmail.com> - 2015-02-21 15:06 -0500
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Ethan Furman <ethan@stoneleaf.us> - 2015-02-21 12:11 -0800
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Dave Angel <davea@davea.name> - 2015-02-21 15:18 -0500
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-21 22:40 +0000
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements "TommyVee" <xxxxxxxx@xxxxxx.xxx> - 2015-02-21 19:07 -0500
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-22 15:02 +1100
      Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements wxjmfauth@gmail.com - 2015-02-22 00:15 -0800
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements Peter Pearson <pkpearson@nowhere.invalid> - 2015-02-22 16:49 +0000
    Re: Algorithm for Creating Supersets of Smaller Sets Based on Common Elements duncan smith <buzzard@invalid.invalid> - 2015-02-22 17:02 +0000

#86063 — Algorithm for Creating Supersets of Smaller Sets Based on Common Elements

From"TommyVee" <xxxxxxxx@xxxxxx.xxx>
Date2015-02-21 14:46 -0500
SubjectAlgorithm for Creating Supersets of Smaller Sets Based on Common Elements
Message-ID<Bg5Gw.1344030$No4.494335@fx19.iad>
Start off with sets of elements as follows:

1. A,B,E,F
2. G,H,L,P,Q
3. C,D,E,F
4. E,X,Z
5. L,M,R
6. O,M,Y

Note that sets 1, 3 and 4 all have the element 'E' in common, therefore they 
are "related" and form the following superset:

A,B,C,D,E,F,X,Z

Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 have 
element 'M' in common, therefore they form the following superset:

G,H,L,M,O,P,Q,R,Y

I think you get the point.  As long as sets have at least 1 common element, 
they combine to form a superset.  Also "links" (common elements) between 
sets may go down multiple levels, as described in the second case above 
(2->5->6).  Cycles thankfully, are not possible.

BTW, the number of individual sets (and resultant supersets) will be very 
large.

I don't know where to start with this.  I thought about some type of 
recursive algorithm, but I'm not sure.  I could figure out the Python 
implementation easy enough, I'm just stumped on the algorithm itself.

Anybody have an idea?

Thanks, Tom 

[toc] | [next] | [standalone]


#86064

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2015-02-21 15:06 -0500
Message-ID<mailman.18981.1424549202.18130.python-list@python.org>
In reply to#86063
On Sat, Feb 21, 2015 at 2:46 PM, TommyVee <xxxxxxxx@xxxxxx.xxx> wrote:
> Start off with sets of elements as follows:
>
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
>
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore they
> are "related" and form the following superset:
>
> A,B,C,D,E,F,X,Z
>
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 have
> element 'M' in common, therefore they form the following superset:
>
> G,H,L,M,O,P,Q,R,Y
>
> I think you get the point.  As long as sets have at least 1 common element,
> they combine to form a superset.  Also "links" (common elements) between
> sets may go down multiple levels, as described in the second case above
> (2->5->6).  Cycles thankfully, are not possible.
>
> BTW, the number of individual sets (and resultant supersets) will be very
> large.
>
> I don't know where to start with this.  I thought about some type of
> recursive algorithm, but I'm not sure.  I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
>
> Anybody have an idea?
>

start with reading about python sets.  If you take the intersection of
two sets it will return a set with common elements.  If that is empty,
they don't pass your test.  If you take the union, you get a set with
the set values in each.


> Thanks, Tom
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]


#86065

FromEthan Furman <ethan@stoneleaf.us>
Date2015-02-21 12:11 -0800
Message-ID<mailman.18982.1424549507.18130.python-list@python.org>
In reply to#86063

[Multipart message — attachments visible in raw view] — view raw

On 02/21/2015 11:46 AM, TommyVee wrote:
> Start off with sets of elements as follows:
> 
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
> 
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore they are "related" and form the following superset:
> 
> A,B,C,D,E,F,X,Z
> 
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 have element 'M' in common, therefore they form
> the following superset:
> 
> G,H,L,M,O,P,Q,R,Y
> 
> I think you get the point.  As long as sets have at least 1 common element, they combine to form a superset.  Also
> "links" (common elements) between sets may go down multiple levels, as described in the second case above (2->5->6). 
> Cycles thankfully, are not possible.
> 
> BTW, the number of individual sets (and resultant supersets) will be very large.
> 
> I don't know where to start with this.  I thought about some type of recursive algorithm, but I'm not sure.  I could
> figure out the Python implementation easy enough, I'm just stumped on the algorithm itself.
> 
> Anybody have an idea?

Use a Counter (collections.Counter), add all sets, then keep any keys with a count of 2 or more.

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#86066

FromDave Angel <davea@davea.name>
Date2015-02-21 15:18 -0500
Message-ID<mailman.18983.1424549962.18130.python-list@python.org>
In reply to#86063
On 02/21/2015 02:46 PM, TommyVee wrote:
> Start off with sets of elements as follows:
>
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
>
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
>
> A,B,C,D,E,F,X,Z
>
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6
> have element 'M' in common, therefore they form the following superset:
>
> G,H,L,M,O,P,Q,R,Y
>
> I think you get the point.  As long as sets have at least 1 common
> element, they combine to form a superset.  Also "links" (common
> elements) between sets may go down multiple levels, as described in the
> second case above (2->5->6).  Cycles thankfully, are not possible.
>
> BTW, the number of individual sets (and resultant supersets) will be
> very large.
>
> I don't know where to start with this.

I can't see where you've defined "this" yet.  I think you're trying to 
find all combinations of sets that are "related", and that you're 
assuming some of them won't be.  So perhaps you're trying to build 
"islands" of related sets, and you want each of those islands to be of 
maximal size.

>  I thought about some type of
> recursive algorithm, but I'm not sure.  I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
>

Seems like you can have a doubly-nested loop, where your inner loop 
collects all the sets that are related to the one in the outer loop. 
You label each group with an island-number, where the island number is 
simply the lowest number set that's on the island.

It's still not clear what you do with each of those islands, but the 
"superset" you refer to before is simply the OR of everything on the island.

Perhaps you want to show all smaller supersets that are composed of at 
least two sets of an island.  That becomes a fairly simple power-of-two 
iteration of those sets, where you throw out the case where the index is 
a power of two (ie. contains only one set).

As you say, calculating the union or intersection of the various sets in 
python is trivial.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#86071

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-02-21 22:40 +0000
Message-ID<mailman.18984.1424558476.18130.python-list@python.org>
In reply to#86063
On 21/02/2015 19:46, TommyVee wrote:
> Start off with sets of elements as follows:
>
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
>
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
>
> A,B,C,D,E,F,X,Z
>
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6
> have element 'M' in common, therefore they form the following superset:
>
> G,H,L,M,O,P,Q,R,Y
>
> I think you get the point.  As long as sets have at least 1 common
> element, they combine to form a superset.  Also "links" (common
> elements) between sets may go down multiple levels, as described in the
> second case above (2->5->6).  Cycles thankfully, are not possible.
>
> BTW, the number of individual sets (and resultant supersets) will be
> very large.
>
> I don't know where to start with this.  I thought about some type of
> recursive algorithm, but I'm not sure.  I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
>
> Anybody have an idea?
>
> Thanks, Tom

A naive approach but should give you something to think about.

from collections import defaultdict

sets = ({'A','B','E','F'},
{'G','H','L','P','Q'},
{'C','D','E','F'},
{'E','X','Z'},
{'L','M','R'},
{'O','M','Y'})

d = defaultdict(list)

for i, aSet in enumerate(sets):
     for a in aSet:
         d[a].append(i)

superSets = []
for k in d:
     if len(d[k]) > 1:
         superSet = set()
         for i in d[k]:
             superSet |= sets[i]
         superSets.append(superSet)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#86073

From"TommyVee" <xxxxxxxx@xxxxxx.xxx>
Date2015-02-21 19:07 -0500
Message-ID<w59Gw.1082219$Lq6.132077@fx29.iad>
In reply to#86063
"TommyVee"  wrote in message news:Bg5Gw.1344030$No4.494335@fx19.iad...

Start off with sets of elements as follows:

1. A,B,E,F
2. G,H,L,P,Q
3. C,D,E,F
4. E,X,Z
5. L,M,R
6. O,M,Y

Note that sets 1, 3 and 4 all have the element 'E' in common, therefore they
are "related" and form the following superset:

A,B,C,D,E,F,X,Z

Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 have
element 'M' in common, therefore they form the following superset:

G,H,L,M,O,P,Q,R,Y

I think you get the point.  As long as sets have at least 1 common element,
they combine to form a superset.  Also "links" (common elements) between
sets may go down multiple levels, as described in the second case above
(2->5->6).  Cycles thankfully, are not possible.

BTW, the number of individual sets (and resultant supersets) will be very
large.

I don't know where to start with this.  I thought about some type of
recursive algorithm, but I'm not sure.  I could figure out the Python
implementation easy enough, I'm just stumped on the algorithm itself.

Anybody have an idea?

Thanks, Tom

Thanks to all!  Not only did you turn around my thinking on this, but I also 
learned about some new Python collection types.  I think I have a way to get 
this to work now.

[toc] | [prev] | [next] | [standalone]


#86076

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-02-22 15:02 +1100
Message-ID<54e954d6$0$13013$c3e8da3$5496439d@news.astraweb.com>
In reply to#86063
TommyVee wrote:

> Start off with sets of elements as follows:
> 
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
> 
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
> 
> A,B,C,D,E,F,X,Z

Sounds to me like you already have an algorithm in mind. It might not be
correct or complete, or it might not be efficient, but you performed some
steps in your own mind to generate that superset. Write down those steps
and you have an algorithm. Once you have the algorithm, you can then turn
it into Python code and start revising it from there.

I went back to first principles and drew a set diagram with multiple
disjoint sets of overlapping sets, and then thought about the process of
*adding* each new set to the set-of-supersets. It's hard to describe, but
if you start drawing an empty Venn diagram, then add one of your sets to
the Venn diagram at a time, merging that set to whatever superset it
belongs to, the process should appear quite obvious.

Let me walk through the process with your six sets above, plus one extra.
First, I create a list of supersets, initially empty:

supersets = []

I look at Set #1, and compare to all the supersets. There aren't any, so I
make a copy of Set #1 and add it to the supersets:

[{A,B,E,F}]

I look at Set #2, and compare it to all the supersets. It doesn't intersect
any of them, so I add it as a new superset:

[{A,B,E,F}, {G,H,L,P,Q}]

I look at Set #3. It intersects the first superset, but not the second, so I
merge them. Likewise for Set #4:

[{A,B,C,D,E,F}, {G,H,L,P,Q}]
[{A,B,C,D,E,F,X,Z}, {G,H,L,P,Q}]

Set #5 intersects the second superset, but not the first. Same for Set #6:

[{A,B,C,D,E,F,X,Z}, {G,H,L,M,P,Q,R}]
[{A,B,C,D,E,F,X,Z}, {G,H,L,M,O,P,Q,R,Y}]

Now let me add an extra Set #7, {A, R}, which intersects *both* supersets.
First I merge it with both:

[{A,B,C,D,E,F,R,X,Z}, {A,G,H,L,M,O,P,Q,R,Y}]

but I don't stop there. Then I recursively run through the supersets with
the same merging process to get:

[{A,B,C,D,E,F,G,H,L,M,O,P,Q,R,X,Y,Z}]

which is the single minimal, disjoint superset of all seven sets.



def merge(sets):
    """Merge a collection of sets into a list of minimal, disjoint
    supersets.
    """
    supersets = []
    for a in sets:
        count = 0
        for s in supersets:
            if a&s:
                s.update(a)
                count += 1
        if count == 0:
            supersets.append(a.copy())
        elif count > 1:
            supersets = merge(supersets)
    return supersets


Testing this with your six sets gives:

py> sets = [set(s) for s in ("ABEF", "GHLPQ", "CDEF", "EXZ", "LMR", "OMY")]
py> for s in merge(sets):
...     print(','.join(sorted(s)))
...
A,B,C,D,E,F,X,Z
G,H,L,M,O,P,Q,R,Y



Adding set #7 gives:

py> for s in merge(sets + [{'A', 'R'}]):
...     print(','.join(sorted(s)))
...
A,B,C,D,E,F,G,H,L,M,O,P,Q,R,X,Y,Z





-- 
Steven

[toc] | [prev] | [next] | [standalone]


#86086

Fromwxjmfauth@gmail.com
Date2015-02-22 00:15 -0800
Message-ID<503257ca-a61a-4aaf-be09-826c553b3f31@googlegroups.com>
In reply to#86076
>>> z = [set(s) for s in ("ABEF", "GHLPQ", "CDEF", "EXZ", "LMR", "OMY")]
>>> x = set()
>>> for e in z:
...     x = x.union(e)
...     
>>> x
{'A', 'C', 'B', 'E', 'D', 'G', 'F', 'H', 'M', 'L', 'O', 'Q', 'P', 'R', 'Y', 'X', 'Z'}
>>> ''.join(x)
'ACBEDGFHMLOQPRYXZ'
>>>

[toc] | [prev] | [next] | [standalone]


#86135

FromPeter Pearson <pkpearson@nowhere.invalid>
Date2015-02-22 16:49 +0000
Message-ID<ckufkqFd796U1@mid.individual.net>
In reply to#86063
On Sat, 21 Feb 2015 14:46:26 -0500, TommyVee wrote:
> Start off with sets of elements as follows:
>
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
>
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore they 
> are "related" and form the following superset:
>
> A,B,C,D,E,F,X,Z
>
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 have 
> element 'M' in common, therefore they form the following superset:
>
> G,H,L,M,O,P,Q,R,Y
>
> I think you get the point.
[snip]

I recommend continuing to work on your statement of the problem until it
is detailed, precise, and complete -- something along the lines of,
"Given a set of sets, return a set of sets having the following
properties: (1)... (2)..."  This approach often brings to light logical
problems in the loosely sketched requirements.  It also produces the
outline of a testing regimen to determine whether an implemented
solution is correct.

-- 
To email me, substitute nowhere->runbox, invalid->com.

[toc] | [prev] | [next] | [standalone]


#86136

Fromduncan smith <buzzard@invalid.invalid>
Date2015-02-22 17:02 +0000
Message-ID<54ea0ba0$0$23001$862e30e2@ngroups.net>
In reply to#86063
On 21/02/15 19:46, TommyVee wrote:
> Start off with sets of elements as follows:
> 
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
> 
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
> 
> A,B,C,D,E,F,X,Z
> 
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6
> have element 'M' in common, therefore they form the following superset:
> 
> G,H,L,M,O,P,Q,R,Y
> 
> I think you get the point.  As long as sets have at least 1 common
> element, they combine to form a superset.  Also "links" (common
> elements) between sets may go down multiple levels, as described in the
> second case above (2->5->6).  Cycles thankfully, are not possible.
> 
> BTW, the number of individual sets (and resultant supersets) will be
> very large.
> 
> I don't know where to start with this.  I thought about some type of
> recursive algorithm, but I'm not sure.  I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
> 
> Anybody have an idea?
> 
> Thanks, Tom

Tom,
    You could construct a graph G such that there exists an edge {v,w}
for each pair of items that appear in a common set. Each connected
component will contain the nodes of one of the supersets you're looking
for. That's unnecessarily expensive, so you can adopt a similar approach
using trees.

http://en.wikipedia.org/wiki/Disjoint-set_data_structure

Construct a forest of trees (initially one tree for each item) and join
pairs of trees until you have one tree for each of your supersets. For
each set of size n you only need to consider n-1 joins. That will ensure
that any pair of items that are in one of the sets are contained in a
single tree. The find and union operations are amortized constant, so
this should be efficient for your large numbers of sets.

The union_find module can be found at
https://github.com/DuncanSmith147/KVMS. Cheers.

Duncan


>>> import union_find
>>> sets = (['A','B','E','F'],
['G','H','L','P','Q'],
['C','D','E','F'],
['E','X','Z'],
['L','M','R'],
['O','M','Y'])
>>> uf = union_find.UnionFindTree()
>>> for a_set in sets:
	for item in a_set:
		try:
			uf.add(item)
		except ValueError:
			pass
	n = a_set[0]
	for item in a_set[1:]:
		is_merged = uf.union(n, item)

		
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for item in uf:
	res[uf.find(item)].append(item)

	
>>> res.values()
[['G', 'H', 'M', 'L', 'O', 'Q', 'P', 'R', 'Y'], ['A', 'C', 'B', 'E',
'D', 'F', 'X', 'Z']]
>>>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web