Groups > comp.lang.python > #64626 > unrolled thread

Re: generate De Bruijn sequence memory and string vs lists

Started by	Vincent Davis <vincent@vincentdavis.net>
First post	2014-01-23 11:10 -0600
Last post	2014-01-25 18:11 +1300
Articles	4 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: generate De Bruijn sequence memory and string vs lists Vincent Davis <vincent@vincentdavis.net> - 2014-01-23 11:10 -0600
    Re: generate De Bruijn sequence memory and string vs lists Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-01-24 21:29 +1300
      Re: generate De Bruijn sequence memory and string vs lists Vincent Davis <vincent@vincentdavis.net> - 2014-01-24 07:20 -0600
      Re: generate De Bruijn sequence memory and string vs lists Greg Ewing <greg.ewing@canterbury.ac.nz> - 2014-01-25 18:11 +1300

#64626 — Re: generate De Bruijn sequence memory and string vs lists

From	Vincent Davis <vincent@vincentdavis.net>
Date	2014-01-23 11:10 -0600
Subject	Re: generate De Bruijn sequence memory and string vs lists
Message-ID	<mailman.5903.1390500573.18130.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

On 1/23/14, 10:18 AM, Dave Angel wrote:
> (something about your message seems to make it unquotable)

Not sure why the message was not quotable. I sent it using gmail.

On 1/23/14, 10:18 AM, Dave Angel wrote:
> 64gig is 4^18, so you can forget about holding a string of size 4^50

I guess I will have to buy more memory or be happy with less, 4**17 would
be ok.

On 1/23/14, 10:18 AM, Dave Angel wrote:
> If memory size is your issue,  why not make the function a
>   generator,  by replacing the append with a yield?

I plan to use the sequence as an index to count occurrences of sequences of
length n. A generator is equivalent to using itertools.permutations (i
think that the right itertool). My thought is that I don't have to store
each individual (sub)sequence since the De Brujin sequence contains all of
them. i.e. it is a compact representation of every sequence generated by
itertools.permutations.

Vincent Davis

On Thu, Jan 23, 2014 at 10:18 AM, Dave Angel <davea@davea.name> wrote:
>
>  Vincent Davis <vincent@vincentdavis.net> Wrote in message:
> >
> (something about your message seems to make it unquotable)
>
> 64gig is 4^18, so you can forget about holding a string of size 4^50
>
> If memory size is your issue,  why not make the function a
>  generator,  by replacing the append with a yield?
>
>
> --
> DaveA
>
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [next] | [standalone]

#64668

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2014-01-24 21:29 +1300
Message-ID	<bkemj0F48q8U1@mid.individual.net>
In reply to	#64626

Vincent Davis wrote:
> I plan to use the sequence as an index to count occurrences of sequences 
> of length n.

If all you want is a mapping between a sequence of
length n and compact representation of it, there's
a much simpler way: just convert it to a base-k
integer, where k is the size of the alphabet.

The resulting integer won't be any larger than an
index into the de Bruijn sequence would be, and
you can easily recover the original sequence from
its encoding without needing any kind of lookup
table.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#64680

From	Vincent Davis <vincent@vincentdavis.net>
Date	2014-01-24 07:20 -0600
Message-ID	<mailman.5938.1390569680.18130.python-list@python.org>
In reply to	#64668

[Multipart message — attachments visible in raw view] — view raw

On Fri, Jan 24, 2014 at 2:29 AM, Gregory Ewing
<greg.ewing@canterbury.ac.nz>wrote:

> If all you want is a mapping between a sequence of
> length n and compact representation of it, there's
> a much simpler way: just convert it to a base-k
> integer, where k is the size of the alphabet.
>
> The resulting integer won't be any larger than an
> index into the de Bruijn sequence would be, and
> you can easily recover the original sequence from
> its encoding without needing any kind of lookup
> table.
>

True, the "all you want is a mapping" is not quite true. I actually plan
to plot frequency (the number of times an observed sub sequence overlaps a
value in the De Bruijn sequence) The way the sub sequences overlap is
important to me and I don't see a way go from base-k (or any other base) to
the index location in the De Bruijn sequence. i.e. a decoding algorithm.

Vincent Davis
720-301-3003

[toc] | [prev] | [next] | [standalone]

#64721

From	Greg Ewing <greg.ewing@canterbury.ac.nz>
Date	2014-01-25 18:11 +1300
Message-ID	<mailman.5963.1390626709.18130.python-list@python.org>
In reply to	#64668

Vincent Davis wrote:
> True, the "all you want is a mapping" is not quite true. I actually 
> plan to plot frequency (the number of times an observed sub sequence 
> overlaps a value in the De Bruijn sequence) The way the sub sequences 
> overlap is important to me and I don't see a way go from base-k (or any 
> other base) to the index location in the De Bruijn sequence. i.e. 
> a decoding algorithm.

So the order or position in which the subsequences appear
in the de Bruijn sequence is important?

Can you explain more about what you're trying to do? Maybe
give a scaled-down example?

Whatever it is, it seems like there should be a more
efficient way than materialising the whole umpteen-gigabyte
sequence.

-- 
Greg

[toc] | [prev] | [standalone]

csiph-web

Re: generate De Bruijn sequence memory and string vs lists

Contents

#64626 — Re: generate De Bruijn sequence memory and string vs lists

#64668

#64680

#64721