Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57539

Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module?

Date 2013-10-25 12:21 -0400
Subject Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module?
From cantor cantor <cantormath2@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.1525.1382718948.18130.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

When trying to run lzma in parallel (see the code below) it hangs for a
very long time. The non-parallel version of the code using map() works fine
as shown in the code below.

Python 3.3.2 [GCC 4.6.3] on linux

import lzmafrom functools import partialimport multiprocessing

def run_lzma(data,c):
    return c.compress(data)

def split_len(seq, length):
    return [str.encode(seq[i:i+length]) for i in range(0, len(seq), length)]


def lzma_mp(sequence,threads=3):
  lzc = lzma.LZMACompressor()
  blocksize = int(round(len(sequence)/threads))
  strings = split_len(sequence, blocksize)
  lzc_partial = partial(run_lzma,c=lzc)
  pool=multiprocessing.Pool()
  lzc_pool = list(pool.map(lzc_partial,strings))
  pool.close()
  pool.join()
  out_flush = lzc.flush()
  return b"".join(lzc_pool + [out_flush])

sequence = 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH
IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWFDWEWEGEFGWEG'


lzma_mp(sequence,threads=3)

When using lzma and the map function it works fine.

threads=3
blocksize = int(round(len(sequence)/threads))
strings = split_len(sequence, blocksize)


lzc = lzma.LZMACompressor()
out = list(map(lzc.compress,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])
lzma.compress(str.encode(sequence))
lzma.compress(str.encode(sequence)) == result

Map using partial function works fine as well.

lzc = lzma.LZMACompressor()
lzc_partial = partial(run_lzma,c=lzc)
out = list(map(lzc_partial,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module? cantor cantor <cantormath2@gmail.com> - 2013-10-25 12:21 -0400

csiph-web