Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Peter Otten <__peter__@web.de>
Subject: Re: Processing a large string
Date: Fri, 12 Aug 2011 16:48:10 +0200
Organization: None
References: <b16af723-854c-449d-8b45-565d73579e17@br5g2000vbb.googlegroups.com> <j22oqv$9ro$1@solani.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2220.1313160461.1164.python-list@python.org>
Lines: 40
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:11279

Peter Otten wrote:

> goldtech wrote:

>> Say I have a very big string with a pattern like:
>> 
>> akakksssk3dhdhdhdbddb3dkdkdkddk3dmdmdmd3dkdkdkdk3asnsn.....
>> 
>> I want to split the sting into separate parts on the "3" and process
>> each part separately. I might run into memory limitations if I use
>> "split" and get a big array(?)  I wondered if there's a way I could
>> read (stream?) the string from start to finish and read what's
>> delimited by the "3" into a variable, process the smaller string
>> variable then append/build a new string with the processed data?

> PS: This has come up before, but I couldn't find the relevant threads...

Alex Martelli a looong time ago:

> from __future__ import generators
> 
> def splitby(fileobj, splitter, bufsize=8192):
>     buf = ''
> 
>     while True:
>         try: 
>             item, buf = buf.split(splitter, 1)
>         except ValueError:
>             more = fileobj.read(bufsize)
>             if not more: break
>             buf += more
>         else:
>             yield item + splitter
> 
>     if buf:
>         yield buf

http://mail.python.org/pipermail/python-list/2002-September/770673.html