Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #11259

Re: Processing a large string

From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: Processing a large string
Followup-To comp.lang.python
Date 2011-08-12 10:39 +0200
Organization None
Message-ID <j22oqv$9ro$1@solani.org> (permalink)
References <b16af723-854c-449d-8b45-565d73579e17@br5g2000vbb.googlegroups.com>

Followups directed to: comp.lang.python

Show all headers | View raw


goldtech wrote:

> Hi,
> 
> Say I have a very big string with a pattern like:
> 
> akakksssk3dhdhdhdbddb3dkdkdkddk3dmdmdmd3dkdkdkdk3asnsn.....
> 
> I want to split the sting into separate parts on the "3" and process
> each part separately. I might run into memory limitations if I use
> "split" and get a big array(?)  I wondered if there's a way I could
> read (stream?) the string from start to finish and read what's
> delimited by the "3" into a variable, process the smaller string
> variable then append/build a new string with the processed data?
> 
> Would I loop it and read it char by char till a "3"...? Or?

You can read the file in chunks:

from functools import partial

def read_chunks(instream, chunksize=None):
    if chunksize is None:
        chunksize = 2**20
    return iter(partial(instream.read, chunksize), "")

def split_file(instream, delimiter, chunksize=None):
    leftover = ""
    chunk = None
    for chunk in read_chunks(instream):
        chunk = leftover + chunk
        parts = chunk.split(delimiter)
        leftover = parts.pop()
        for part in parts:
            yield part
    if leftover or chunk is None or chunk.endswith(delimiter):
        yield leftover

I hope I got the corner cases right.

PS: This has come up before, but I couldn't find the relevant threads...

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Processing a large string goldtech <goldtech@worldpost.com> - 2011-08-11 19:03 -0700
  Re: Processing a large string MRAB <python@mrabarnett.plus.com> - 2011-08-12 03:15 +0100
  Re: Processing a large string Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-12 12:30 +1000
  Re: Processing a large string Nobody <nobody@nowhere.com> - 2011-08-12 05:11 +0100
  Re: Processing a large string Peter Otten <__peter__@web.de> - 2011-08-12 10:39 +0200
    Re: Processing a large string goldtech <goldtech@worldpost.com> - 2011-08-12 06:36 -0700
    Re: Processing a large string Peter Otten <__peter__@web.de> - 2011-08-12 16:48 +0200
  Re: Processing a large string Paul Rudin <paul.nospam@rudin.co.uk> - 2011-08-28 20:18 +0100

csiph-web