Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93330 > unrolled thread
| Started by | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| First post | 2015-06-30 08:24 -0700 |
| Last post | 2015-07-01 15:03 +1000 |
| Articles | 4 — 2 participants |
Back to article view | Back to comp.lang.python
Parsing logfile with multi-line loglines, separated by timestamp? Victor Hooi <victorhooi@gmail.com> - 2015-06-30 08:24 -0700
Re: Parsing logfile with multi-line loglines, separated by timestamp? Chris Angelico <rosuav@gmail.com> - 2015-07-01 02:02 +1000
Re: Parsing logfile with multi-line loglines, separated by timestamp? Victor Hooi <victorhooi@gmail.com> - 2015-06-30 21:06 -0700
Re: Parsing logfile with multi-line loglines, separated by timestamp? Chris Angelico <rosuav@gmail.com> - 2015-07-01 15:03 +1000
| From | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| Date | 2015-06-30 08:24 -0700 |
| Subject | Parsing logfile with multi-line loglines, separated by timestamp? |
| Message-ID | <b8916490-20f3-4070-86dc-821adadd895b@googlegroups.com> |
Hi,
I'm trying to parse iostat -xt output using Python. The quirk with iostat is that the output for each second runs over multiple lines. For example:
06/30/2015 03:09:17 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 0.03 0.00 0.00 99.94
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.04 0.02 0.07 0.30 3.28 81.37 0.00 29.83 2.74 38.30 0.47 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 11.62 0.00 0.23 0.19 2.13 0.16 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 10.29 0.00 0.41 0.41 0.73 0.38 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 9.12 0.00 0.36 0.35 1.20 0.34 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 33.35 0.00 1.39 0.41 8.91 0.39 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 11.66 0.00 0.46 0.46 0.00 0.37 0.00
06/30/2015 03:09:18 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 0.00 0.00 99.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06/30/2015 03:09:19 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 0.00 0.00 99.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Essentially I need to parse the output in "chunks", where each chunk is separated by a timestamp.
I was looking at itertools.groupby(), but that doesn't seem to quite do what I want here - it seems more for grouping lines, where each is united by a common key, or something that you can use a function to check for.
Another thought was something like:
for line in f:
if line.count("/") == 2 and line.count(":") == 2:
current_time = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
while line.count("/") != 2 and line.count(":") != 2:
print(line)
continue
But that didn't quite seem to work.
Is there a Pythonic way of parsing the above iostat output, and break it into chunks split by the timestamp?
Cheers,
Victor
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-07-01 02:02 +1000 |
| Message-ID | <mailman.192.1435680170.3674.python-list@python.org> |
| In reply to | #93330 |
On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
> Maybe define a class which wraps a file-like object. Its next() method (or
> is it __next__() method?) can just buffer up lines starting with one which
> successfully parses as a timestamp, accumulates all the rest, until a blank
> line or EOF is seen, then return that, either as a list of strings, one
> massive string, or some higher level representation (presumably an instance
> of another class) which represents one "paragraph" of iostat output.
next() in Py2, __next__() in Py3. But I'd do it, instead, as a
generator - that takes care of all the details, and you can simply
yield useful information whenever you have it. Something like this
(untested):
def parse_iostat(lines):
"""Parse lines of iostat information, yielding ... something
lines should be an iterable yielding separate lines of output
"""
block = None
for line in lines:
line = line.strip()
try:
tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
if block: yield block
block = [tm]
except ValueError:
# It's not a new timestamp, so add it to the existing block
block.append(line)
if block: yield block
This is a fairly classic line-parsing generator. You can pass it a
file-like object, a list of strings, or anything else that it can
iterate over; it'll yield some sort of aggregate object representing
each time's block. In this case, all it does is append strings to a
list, so this will result in a series of lists of strings, each one
representing a single timestamp; you can parse the other lines in any
way you like and aggregate useful data. Usage would be something like
this:
with open("logfile") as f:
for block in parse_iostat(f):
# do stuff with block
This will work quite happily with an ongoing stream, too, so if you're
working with a pipe from a currently-running process, it'll pick stuff
up just fine. (However, since it uses the timestamp as its signature,
it won't yield anything till it gets the *next* timestamp. If the
blank line is sufficient to denote the end of a block, you could
change the loop to look for that instead.)
Hope that helps!
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Victor Hooi <victorhooi@gmail.com> |
|---|---|
| Date | 2015-06-30 21:06 -0700 |
| Message-ID | <51f65e41-76e9-48c4-8f79-ba4ac060bbe3@googlegroups.com> |
| In reply to | #93334 |
Aha, cool, that's a good idea =) - it seems I should spend some time getting to know generators/iterators.
Also, sorry if this is basic, but once I have the "block" list itself, what is the best way to parse each relevant line?
In this case, the first line is a timestamp, the next two lines are system stats, and then a newline, and then one line for each block device.
I could just hardcode in the lines, but that seems ugly:
for block in parse_iostat(f):
for i, line in enumerate(block):
if i == 0:
print("timestamp is {}".format(line))
elif i == 1 or i == 2:
print("system stats: {}".format(line))
elif i >= 4:
print("disk stats: {}".format(line))
Is there a prettier or more Pythonic way of doing this?
Thanks,
Victor
On Wednesday, 1 July 2015 02:03:01 UTC+10, Chris Angelico wrote:
> On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
> > Maybe define a class which wraps a file-like object. Its next() method (or
> > is it __next__() method?) can just buffer up lines starting with one which
> > successfully parses as a timestamp, accumulates all the rest, until a blank
> > line or EOF is seen, then return that, either as a list of strings, one
> > massive string, or some higher level representation (presumably an instance
> > of another class) which represents one "paragraph" of iostat output.
>
> next() in Py2, __next__() in Py3. But I'd do it, instead, as a
> generator - that takes care of all the details, and you can simply
> yield useful information whenever you have it. Something like this
> (untested):
>
> def parse_iostat(lines):
> """Parse lines of iostat information, yielding ... something
>
> lines should be an iterable yielding separate lines of output
> """
> block = None
> for line in lines:
> line = line.strip()
> try:
> tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p")
> if block: yield block
> block = [tm]
> except ValueError:
> # It's not a new timestamp, so add it to the existing block
> block.append(line)
> if block: yield block
>
> This is a fairly classic line-parsing generator. You can pass it a
> file-like object, a list of strings, or anything else that it can
> iterate over; it'll yield some sort of aggregate object representing
> each time's block. In this case, all it does is append strings to a
> list, so this will result in a series of lists of strings, each one
> representing a single timestamp; you can parse the other lines in any
> way you like and aggregate useful data. Usage would be something like
> this:
>
> with open("logfile") as f:
> for block in parse_iostat(f):
> # do stuff with block
>
> This will work quite happily with an ongoing stream, too, so if you're
> working with a pipe from a currently-running process, it'll pick stuff
> up just fine. (However, since it uses the timestamp as its signature,
> it won't yield anything till it gets the *next* timestamp. If the
> blank line is sufficient to denote the end of a block, you could
> change the loop to look for that instead.)
>
> Hope that helps!
>
> ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-07-01 15:03 +1000 |
| Message-ID | <mailman.205.1435726989.3674.python-list@python.org> |
| In reply to | #93356 |
On Wed, Jul 1, 2015 at 2:06 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> Aha, cool, that's a good idea =) - it seems I should spend some time getting to know generators/iterators.
>
> Also, sorry if this is basic, but once I have the "block" list itself, what is the best way to parse each relevant line?
>
> In this case, the first line is a timestamp, the next two lines are system stats, and then a newline, and then one line for each block device.
>
> I could just hardcode in the lines, but that seems ugly:
>
> for block in parse_iostat(f):
> for i, line in enumerate(block):
> if i == 0:
> print("timestamp is {}".format(line))
> elif i == 1 or i == 2:
> print("system stats: {}".format(line))
> elif i >= 4:
> print("disk stats: {}".format(line))
>
> Is there a prettier or more Pythonic way of doing this?
This is where you get into the nitty-gritty of writing a text parser.
Most of the work is in figuring out exactly what pieces of information
matter to you. I recommend putting most of the work into the
parse_iostat() function, and then yielding some really nice tidy
package that can be interpreted conveniently.
ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web