Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57539

Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module?

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <cantormath2@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.003
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'skip:[ 20': 0.04; 'below)': 0.09; 'subject:Why': 0.09; 'subject:module': 0.09; 'subject:using': 0.09; 'python': 0.11; 'def': 0.12; 'fine.': 0.16; 'range(0,': 0.16; "skip:' 60": 0.16; 'skip:i 80': 0.16; 'subject: \n ': 0.16; 'subject:run': 0.16; 'subject:skip:m 10': 0.16; 'subject:when': 0.16; 'subject:python': 0.16; 'trying': 0.19; 'import': 0.22; 'skip:l 30': 0.24; 'fine': 0.24; '(see': 0.26; 'shown': 0.26; 'function': 0.29; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'run': 0.32; 'linux': 0.33; 'subject:time': 0.33; 'received:google.com': 0.35; 'version': 0.36; 'sequence': 0.36; 'subject:?': 0.36; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'map': 0.64; 'below.': 0.71; 'partial': 0.84; 'subject:long': 0.84; 'subject:very': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Xr+tXjHcpjcVkSaTLhnDkanzNy0U0kiFDXLnRa1Hdd8=; b=q502iva39Cwodtgiy1DvjspwH9Qut8mL3c8kppoCcKyV0YFUhu9ph5JkO8y22W4RT2 QMOhTUxpbpp/dP/gkJAcsYCSrwL5vzqC+pt5ftXbv9BOArf1ESDnzTiag53E8Guluf6M FpBQEBEEkHQF2PF5LUCV+VAI4DREsvbZ4Y4tA70w2KSQ8TLNF4NbhE7BBsX9sJYYejqr bL8OdV+ZsKNLpFeEQhd1yMEGMptrgAN+Hod7Cn6HZMxo2OzkXmb2ARF0tSFAbGGM4vvL UvahDawCWy/Ag6bOemY/XiivGplesWU9EtxeVq1vmSq74E+Z8N/LWLnTagDJM0rlCG5q pjfA==
MIME-Version 1.0
X-Received by 10.221.18.70 with SMTP id qf6mr455430vcb.37.1382718067120; Fri, 25 Oct 2013 09:21:07 -0700 (PDT)
Date Fri, 25 Oct 2013 12:21:07 -0400
Subject Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module?
From cantor cantor <cantormath2@gmail.com>
To python-list@python.org
Content-Type multipart/alternative; boundary=001a11333c2237543f04e99323b8
X-Mailman-Approved-At Fri, 25 Oct 2013 18:35:47 +0200
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1525.1382718948.18130.python-list@python.org> (permalink)
Lines 222
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1382718948 news.xs4all.nl 15960 [2001:888:2000:d::a6]:35388
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:57539

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

When trying to run lzma in parallel (see the code below) it hangs for a
very long time. The non-parallel version of the code using map() works fine
as shown in the code below.

Python 3.3.2 [GCC 4.6.3] on linux

import lzmafrom functools import partialimport multiprocessing

def run_lzma(data,c):
    return c.compress(data)

def split_len(seq, length):
    return [str.encode(seq[i:i+length]) for i in range(0, len(seq), length)]


def lzma_mp(sequence,threads=3):
  lzc = lzma.LZMACompressor()
  blocksize = int(round(len(sequence)/threads))
  strings = split_len(sequence, blocksize)
  lzc_partial = partial(run_lzma,c=lzc)
  pool=multiprocessing.Pool()
  lzc_pool = list(pool.map(lzc_partial,strings))
  pool.close()
  pool.join()
  out_flush = lzc.flush()
  return b"".join(lzc_pool + [out_flush])

sequence = 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH
IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWFDWEWEGEFGWEG'


lzma_mp(sequence,threads=3)

When using lzma and the map function it works fine.

threads=3
blocksize = int(round(len(sequence)/threads))
strings = split_len(sequence, blocksize)


lzc = lzma.LZMACompressor()
out = list(map(lzc.compress,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])
lzma.compress(str.encode(sequence))
lzma.compress(str.encode(sequence)) == result

Map using partial function works fine as well.

lzc = lzma.LZMACompressor()
lzc_partial = partial(run_lzma,c=lzc)
out = list(map(lzc_partial,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module? cantor cantor <cantormath2@gmail.com> - 2013-10-25 12:21 -0400

csiph-web