Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <CANc-5Uw-w8pLssy0usPSpnVEPuHJQCz6RJyNfsiHNUwP=KpqpQ@mail.gmail.com>
References: <b8916490-20f3-4070-86dc-821adadd895b@googlegroups.com> <CANc-5Uw-w8pLssy0usPSpnVEPuHJQCz6RJyNfsiHNUwP=KpqpQ@mail.gmail.com>
Date: Wed, 1 Jul 2015 02:02:41 +1000
Subject: Re: Parsing logfile with multi-line loglines, separated by timestamp?
From: Chris Angelico <rosuav@gmail.com>
Cc: Python <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.192.1435680170.3674.python-list@python.org>
Lines: 53
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:93334

On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
> Maybe define a class which wraps a file-like object. Its next() method (or
> is it __next__() method?) can just buffer up lines starting with one which
> successfully parses as a timestamp, accumulates all the rest, until a blank
> line or EOF is seen, then return that, either as a list of strings, one
> massive string, or some higher level representation (presumably an instance
> of another class) which represents one "paragraph" of iostat output.

next() in Py2, __next__() in Py3. But I'd do it, instead, as a
generator - that takes care of all the details, and you can simply
yield useful information whenever you have it. Something like this
(untested):

def parse_iostat(lines):
    """Parse lines of iostat information, yielding ... something

    lines should be an iterable yielding separate lines of output
    """
    block = None
    for line in lines:
        line = line.strip()
        try:
            tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
            if block: yield block
            block = [tm]
        except ValueError:
            # It's not a new timestamp, so add it to the existing block
            block.append(line)
    if block: yield block

This is a fairly classic line-parsing generator. You can pass it a
file-like object, a list of strings, or anything else that it can
iterate over; it'll yield some sort of aggregate object representing
each time's block. In this case, all it does is append strings to a
list, so this will result in a series of lists of strings, each one
representing a single timestamp; you can parse the other lines in any
way you like and aggregate useful data. Usage would be something like
this:

with open("logfile") as f:
    for block in parse_iostat(f):
        # do stuff with block

This will work quite happily with an ongoing stream, too, so if you're
working with a pipe from a currently-running process, it'll pick stuff
up just fine. (However, since it uses the timestamp as its signature,
it won't yield anything till it gets the *next* timestamp. If the
blank line is sufficient to denote the end of a block, you could
change the loop to look for that instead.)

Hope that helps!

ChrisA