Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #93356

Re: Parsing logfile with multi-line loglines, separated by timestamp?

Newsgroups comp.lang.python
Date 2015-06-30 21:06 -0700
References <b8916490-20f3-4070-86dc-821adadd895b@googlegroups.com> <CANc-5Uw-w8pLssy0usPSpnVEPuHJQCz6RJyNfsiHNUwP=KpqpQ@mail.gmail.com> <mailman.192.1435680170.3674.python-list@python.org>
Message-ID <51f65e41-76e9-48c4-8f79-ba4ac060bbe3@googlegroups.com> (permalink)
Subject Re: Parsing logfile with multi-line loglines, separated by timestamp?
From Victor Hooi <victorhooi@gmail.com>

Show all headers | View raw


Aha, cool, that's a good idea =) - it seems I should spend some time getting to know generators/iterators.

Also, sorry if this is basic, but once I have the "block" list itself, what is the best way to parse each relevant line?

In this case, the first line is a timestamp, the next two lines are system stats, and then a newline, and then one line for each block device.

I could just hardcode in the lines, but that seems ugly:

  for block in parse_iostat(f):
      for i, line in enumerate(block):
          if i == 0:
              print("timestamp is {}".format(line))
          elif i == 1 or i == 2:
              print("system stats: {}".format(line))
          elif i >= 4:
              print("disk stats: {}".format(line))

Is there a prettier or more Pythonic way of doing this?

Thanks,
Victor

On Wednesday, 1 July 2015 02:03:01 UTC+10, Chris Angelico  wrote:
> On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
> > Maybe define a class which wraps a file-like object. Its next() method (or
> > is it __next__() method?) can just buffer up lines starting with one which
> > successfully parses as a timestamp, accumulates all the rest, until a blank
> > line or EOF is seen, then return that, either as a list of strings, one
> > massive string, or some higher level representation (presumably an instance
> > of another class) which represents one "paragraph" of iostat output.
> 
> next() in Py2, __next__() in Py3. But I'd do it, instead, as a
> generator - that takes care of all the details, and you can simply
> yield useful information whenever you have it. Something like this
> (untested):
> 
> def parse_iostat(lines):
>     """Parse lines of iostat information, yielding ... something
> 
>     lines should be an iterable yielding separate lines of output
>     """
>     block = None
>     for line in lines:
>         line = line.strip()
>         try:
>             tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
>             if block: yield block
>             block = [tm]
>         except ValueError:
>             # It's not a new timestamp, so add it to the existing block
>             block.append(line)
>     if block: yield block
> 
> This is a fairly classic line-parsing generator. You can pass it a
> file-like object, a list of strings, or anything else that it can
> iterate over; it'll yield some sort of aggregate object representing
> each time's block. In this case, all it does is append strings to a
> list, so this will result in a series of lists of strings, each one
> representing a single timestamp; you can parse the other lines in any
> way you like and aggregate useful data. Usage would be something like
> this:
> 
> with open("logfile") as f:
>     for block in parse_iostat(f):
>         # do stuff with block
> 
> This will work quite happily with an ongoing stream, too, so if you're
> working with a pipe from a currently-running process, it'll pick stuff
> up just fine. (However, since it uses the timestamp as its signature,
> it won't yield anything till it gets the *next* timestamp. If the
> blank line is sufficient to denote the end of a block, you could
> change the loop to look for that instead.)
> 
> Hope that helps!
> 
> ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Parsing logfile with multi-line loglines, separated by timestamp? Victor Hooi <victorhooi@gmail.com> - 2015-06-30 08:24 -0700
  Re: Parsing logfile with multi-line loglines, separated by timestamp? Chris Angelico <rosuav@gmail.com> - 2015-07-01 02:02 +1000
    Re: Parsing logfile with multi-line loglines, separated by timestamp? Victor Hooi <victorhooi@gmail.com> - 2015-06-30 21:06 -0700
      Re: Parsing logfile with multi-line loglines, separated by timestamp? Chris Angelico <rosuav@gmail.com> - 2015-07-01 15:03 +1000

csiph-web