Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'else:': 0.03; 'algorithm': 0.04; '"""': 0.07; 'string': 0.09; '[0,': 0.09; '[0]': 0.09; 'integers': 0.09; 'received:209.85.219': 0.09; 'sequences.': 0.09; 'subject:string': 0.09; 'python': 0.11; 'def': 0.12; "'0'": 0.16; '1):': 0.16; '50)': 0.16; 'alphabet': 0.16; 'better?': 0.16; 'outputs': 0.16; 'subsequences': 0.16; 'memory': 0.22; 'to:name :python-list@python.org': 0.22; 'algorithms.': 0.24; 'fine': 0.24; '>': 0.26; 'message-id:@mail.gmail.com': 0.30; "skip:' 10": 0.31; 'url:wiki': 0.31; 'accomplished': 0.31; 'sep': 0.31; 'skip:q 20': 0.31; 'url:wikipedia': 0.31; 'this.': 0.32; 'option': 0.32; 'received:209.85': 0.35; 'subject:lists': 0.35; 'but': 0.35; 'received:google.com': 0.35; '8bit%:9': 0.36; 'sequence': 0.36; 'entry': 0.36; 'url:org': 0.36; 'changing': 0.37; 'list': 0.37; 'received:209': 0.37; 'skip:& 10': 0.38; 'to:addr:python-list': 0.38; 'skip:& 20': 0.39; 'generating': 0.39; 'to:addr:python.org': 0.39; 'skip:\xc2 10': 0.60; 'length': 0.61; '8bit%:10': 0.64; 'more': 0.64; '8bit%:31': 0.68; 'limit': 0.70; 'below.': 0.71; '8bit%:16': 0.84; 'computation.': 0.84; 'n):': 0.84; 'questions;': 0.84; 'ugly,': 0.84; 'time)': 0.91; 'wanting': 0.93; '2013': 0.98 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=hhXbMNEsXcKNlW/Q2vPOsywbCEot0rrz9O6kA69UfHc=; b=DMJpAJV2Dz5OaEd8OqoUbRbNUdcK2CuWNNBMmnlQMORkvIaMGu46RtHMO6TyrelRi7 IgD9s4HAacURim+l72yjsoIWdZoOe+qksbvmBLgi/K8FMiXMdOSTxMsPDBObLEuI3paC 5AAufd8BeRYKRqCKyHlBrgCEQVEyS3QrYam8wJgqIR6eiNDixw4KNG19c4EcBakQPwsz uieTvLnAGciYS0QkIv5G5q+rcsPMmFUzmlhMVFE0V0bbsl3HQ5FdzYnuiO/da5CBeQuq 0Mo967QIOxBB/P1j/Y/2/RcZAI/HhuAV0u8Kgd8Cj8MUc58JlMBpR7J0zN7TMr6EpjbK dBVw== X-Gm-Message-State: ALoCoQlNGJ2achpPfq2TNpD+nwihJM1VYD9Zmp9uRMLbPHNmjT1ZDHxyDu/L2IHDxZm8YFsPo0XY X-Received: by 10.182.117.195 with SMTP id kg3mr6742670obb.17.1390487020024; Thu, 23 Jan 2014 06:23:40 -0800 (PST) MIME-Version: 1.0 From: Vincent Davis Date: Thu, 23 Jan 2014 08:23:19 -0600 Subject: generate De Bruijn sequence memory and string vs lists To: "python-list@python.org" Content-Type: multipart/alternative; boundary=089e0149c506e4cfa304f0a3fc71 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 309 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1390487029 news.xs4all.nl 2849 [2001:888:2000:d::a6]:36159 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64611 --089e0149c506e4cfa304f0a3fc71 Content-Type: text/plain; charset=UTF-8 For reference, Wikipedia entry for De Bruijn sequence http://en.wikipedia.org/wiki/De_Bruijn_sequence At the above link is a python algorithm for generating De Brujin sequences. It works fine but outputs a list of integers [0, 0, 0, 1, 0, 1, 1, 1] and I would prefer a string '00010111'. This can be accomplished by changing the last line from; return sequence to return ''.join([str(i) for i in sequence]) See de_bruijn_1 Below. The other option would be to manipulate strings directly (kind of). I butchered the original algorithm to do this. See de_bruijn_2 below. But it is much slower and ungly. I am wanting to make a few large De Bruijin sequences. hopefully on the order of de_bruijn(4, 50) to de_bruijn(4, 100) (wishful thinking?). I don't know the limits (memory or time) for the current algorithms. I think I am will hit the memory mazsize limit at about 4^31. The system I will be using has 64GB RAM. The size of a De Brujin sequence is k^n My questions; 1, de_bruijn_2 is ugly, any suggestions to do it better? 2, de_bruijn_2 is significantly slower than de_bruijn_1. Speedups? 3, Any thought on which is more memory efficient during computation. #### 1 #### def de_bruijn_1(k, n): """ De Bruijn sequence for alphabet size k (0,1,2...k-1) and subsequences of length n. From wikipedia Sep 22 2013 """ a = [0] * k * n sequence = [] def db(t, p,): if t > n: if n % p == 0: for j in range(1, p + 1): sequence.append(a[j]) else: a[t] = a[t - p] db(t + 1, p) for j in range(int(a[t - p]) + 1, k): a[t] = j db(t + 1, t) db(1, 1) #return sequence #original return ''.join([str(i) for i in sequence]) d1 = de_bruijn_1(4, 8) #### 2 #### def de_bruijn_2(k, n): global sequence a = '0' * k * n sequence = '' def db(t, p): global sequence global a if t > n: if n % p == 0: for j in range(1, p + 1): sequence = sequence + a[j] else: a = a[:t] + a[t - p] + a[t+1:] db(t + 1, p) for j in range(int(a[t - p]) + 1, k): a = a[:t] + str(j) + a[t+1:] db(t + 1, t) return sequence db(1, 1) return sequence d2 = de_bruijn_2(4, 8) Vincent Davis --089e0149c506e4cfa304f0a3fc71 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
=
For reference, Wikipedia entry for De Bruijn sequence=

At the above link is a python a= lgorithm for generating De Brujin sequences. It works fine but outputs a li= st of integers=C2=A0[0, 0, 0, 1, 0, 1, 1,= 1] and I would prefer a string '00010111'. This can be=C2=A0accomplished=C2=A0by changing=C2=A0the last line from;
return=C2=A0sequence
to
return ''= .join([str(i) for i in sequence])
See=C2=A0de_bruijn_1 Below.

The other option would be to manipulate st= rings directly (kind of). I=C2=A0butchered=C2=A0the=C2=A0original algorithm= =C2=A0to do this. See=C2=A0de_bruij= n_2 below. But it is much slower and ungly.

= I am wanting to = make a few large De Bruijin=C2=A0sequences. hopefully on the order of=C2=A0= de_bruijn(4, 50) to=C2=A0de_bruijn(4, 100) (wishful thinking?).= =C2=A0I don't know the limits (memory or time) for the current=C2=A0algor= ithms. I think I am will hit the memory mazsize limit at about 4^31. The sy= stem I will be using has 64GB RAM.
The size of a De Brujin=C2=A0sequence=C2=A0is k^n

My questions;
=
1,=C2=A0de_br= uijn_2 is ugly, any suggestions to do it better?
2,=C2=A0de_bruijn_2 is= =C2=A0significantly=C2=A0slower than=C2=A0de_bruijn_1. Speedups?
3, Any thought on which is more memory=C2=A0efficient during computa= tion.

def de_bruijn_1(k, n):
=C2=A0 =C2=A0 """<= /div>
=C2=A0= =C2=A0 De Bruijn sequence for alphabet size k (0,1,2...k-1)
=C2=A0 =C2=A0 and subsequences of length= n.
=C2=A0 =C2=A0 From wikipedia Sep 22 2013
=C2=A0 =C2=A0 ""&q= uot;
=C2=A0 =C2= =A0 a =3D [0] * k * n
=C2=A0 =C2=A0 sequence =3D []
=C2=A0 =C2=A0 def d= b(t, p,):
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 if t > n:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i= f n % p =3D=3D 0:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = for j in range(1, p + 1):
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sequence.append= (a[j])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 else:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 a[t] =3D a[t - p]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 db(t + 1= , p)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 for j in range(int(a[t - p]= ) + 1, k):
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 a[t] =3D j
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 db(t + 1, t)
=C2=A0 =C2=A0 db(1, 1)
=C2=A0 =C2=A0 #r= eturn sequence =C2=A0#original
=C2=A0 =C2=A0 return ''.join([str(i= ) for i in sequence])

=
d1 = =3D de_bruijn_1(4, 8)

def de_bruijn_2(k, n):
=C2=A0 =C2=A0 global seq= uence
=C2=A0= =C2=A0 a =3D '0' * k * n
=C2=A0 =C2=A0 sequence = =3D ''
=C2=A0 =C2=A0 def db(t, p):
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 global sequence
=C2=A0 =C2=A0 =C2=A0 =C2=A0 global a
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 if t > n:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if n % p =3D=3D 0:
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 for j in range(1, p + 1):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sequence =3D sequence + a[j]
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 else:
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 a =3D a[:t] + a[t - p] =C2=A0+ a[= t+1:]
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 db(t + 1, p)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 for j in range= (int(a[t - p]) + 1, k):
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 a =3D a[:t] + str(j) =C2=A0+ a[t+1:]=
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 db(t + 1, t)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 return sequence
=C2=A0 =C2=A0 db(1, 1)
=C2=A0 =C2=A0 return seq= uence

d2 =3D de_b= ruijn_2(4, 8)


=
Vincent Davis
--089e0149c506e4cfa304f0a3fc71--