Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'skip:[ 20': 0.04; 'below)': 0.09; 'subject:Why': 0.09; 'subject:module': 0.09; 'subject:using': 0.09; 'python': 0.11; 'def': 0.12; 'fine.': 0.16; 'range(0,': 0.16; "skip:' 60": 0.16; 'skip:i 80': 0.16; 'subject: \n ': 0.16; 'subject:run': 0.16; 'subject:skip:m 10': 0.16; 'subject:when': 0.16; 'subject:python': 0.16; 'trying': 0.19; 'import': 0.22; 'skip:l 30': 0.24; 'fine': 0.24; '(see': 0.26; 'shown': 0.26; 'function': 0.29; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'run': 0.32; 'linux': 0.33; 'subject:time': 0.33; 'received:google.com': 0.35; 'version': 0.36; 'sequence': 0.36; 'subject:?': 0.36; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'map': 0.64; 'below.': 0.71; 'partial': 0.84; 'subject:long': 0.84; 'subject:very': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Xr+tXjHcpjcVkSaTLhnDkanzNy0U0kiFDXLnRa1Hdd8=; b=q502iva39Cwodtgiy1DvjspwH9Qut8mL3c8kppoCcKyV0YFUhu9ph5JkO8y22W4RT2 QMOhTUxpbpp/dP/gkJAcsYCSrwL5vzqC+pt5ftXbv9BOArf1ESDnzTiag53E8Guluf6M FpBQEBEEkHQF2PF5LUCV+VAI4DREsvbZ4Y4tA70w2KSQ8TLNF4NbhE7BBsX9sJYYejqr bL8OdV+ZsKNLpFeEQhd1yMEGMptrgAN+Hod7Cn6HZMxo2OzkXmb2ARF0tSFAbGGM4vvL UvahDawCWy/Ag6bOemY/XiivGplesWU9EtxeVq1vmSq74E+Z8N/LWLnTagDJM0rlCG5q pjfA== MIME-Version: 1.0 X-Received: by 10.221.18.70 with SMTP id qf6mr455430vcb.37.1382718067120; Fri, 25 Oct 2013 09:21:07 -0700 (PDT) Date: Fri, 25 Oct 2013 12:21:07 -0400 Subject: Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module? From: cantor cantor To: python-list@python.org Content-Type: multipart/alternative; boundary=001a11333c2237543f04e99323b8 X-Mailman-Approved-At: Fri, 25 Oct 2013 18:35:47 +0200 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 222 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1382718948 news.xs4all.nl 15960 [2001:888:2000:d::a6]:35388 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:57539 --001a11333c2237543f04e99323b8 Content-Type: text/plain; charset=ISO-8859-1 When trying to run lzma in parallel (see the code below) it hangs for a very long time. The non-parallel version of the code using map() works fine as shown in the code below. Python 3.3.2 [GCC 4.6.3] on linux import lzmafrom functools import partialimport multiprocessing def run_lzma(data,c): return c.compress(data) def split_len(seq, length): return [str.encode(seq[i:i+length]) for i in range(0, len(seq), length)] def lzma_mp(sequence,threads=3): lzc = lzma.LZMACompressor() blocksize = int(round(len(sequence)/threads)) strings = split_len(sequence, blocksize) lzc_partial = partial(run_lzma,c=lzc) pool=multiprocessing.Pool() lzc_pool = list(pool.map(lzc_partial,strings)) pool.close() pool.join() out_flush = lzc.flush() return b"".join(lzc_pool + [out_flush]) sequence = 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWFDWEWEGEFGWEG' lzma_mp(sequence,threads=3) When using lzma and the map function it works fine. threads=3 blocksize = int(round(len(sequence)/threads)) strings = split_len(sequence, blocksize) lzc = lzma.LZMACompressor() out = list(map(lzc.compress,strings)) out_flush = lzc.flush() result = b"".join(out + [out_flush]) lzma.compress(str.encode(sequence)) lzma.compress(str.encode(sequence)) == result Map using partial function works fine as well. lzc = lzma.LZMACompressor() lzc_partial = partial(run_lzma,c=lzc) out = list(map(lzc_partial,strings)) out_flush = lzc.flush() result = b"".join(out + [out_flush]) --001a11333c2237543f04e99323b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

When trying to run lzma in parallel (see the code below) it=20 hangs for a very long time. The non-parallel version of the code using=20 map() works fine as shown in the code below.

Python 3.3.2 [GCC 4.6.3] on linux

import=
 lzma
from functools import partial
import multiprocessing


def run_lzma(data,c):
    return c.compress(<=
span class=3D"">data)


def split_len(seq, length):
    return [str.encode(seq=
[i:=
i+length]) fo=
r i in range(0, len(seq), length)]



def lzma_mp(sequence,threads=3D3<=
/span>):
  lzc =3D lzma.LZMACompressor(=
)
  blocksize =3D int<=
span class=3D"">(round(len(seq=
uence)/threads))
  strings =3D split_len(sequence, blocksize)
  lzc_partial =3D partial(run_lzma,c=3Dlzc)
  pool=3Dmultiprocessing.Pool(=
)
  lzc_pool =3D list<=
span class=3D"">(pool.map(lzc_=
partial,strings))
  pool.close()
  pool.join()
  out_flush =3D lzc<=
span class=3D"">.flush()
  return b"".join(lzc_pool + [out_flush])

sequence =3D 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH =
IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWF=
DWEWEGEFGWEG'


lzma_mp(sequence,threads=3D3)

When using lzma and the map function it works fine.

threads=3D3
blocksize =3D int(round(len(seque=
nce)/threads))
strings =3D split_len(sequence=
, blocksize)


lzc =3D lzma.LZMACompressor()<=
/span>
out =3D list(map(lzc.compress,strings))
out_flush =3D lzc.flush()
result =3D b"".join<=
/span>(out + [out_flush])
lzma.compress(str.encode(sequence=
))
lzma.compress(str.encode(sequence=
)) =
=3D=3D result

Map using partial function works fine as well.

lzc =
=3D lzma.LZMACompressor()
lzc_partial =3D partial(run_lzma,c=3Dlzc)
out =3D list(map(lzc_partial,stri=
ngs))
out_flush =3D lzc.flush()
result =3D b"".join<=
/span>(out + [out_flush])
--001a11333c2237543f04e99323b8--