Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #36378

Re: Searching through two logfiles in parallel?

References <8990fb07-fbd2-47ab-8b83-97d34580ebe3@googlegroups.com>
Date 2013-01-07 22:58 +0000
Subject Re: Searching through two logfiles in parallel?
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.240.1357599526.2939.python-list@python.org> (permalink)

Show all headers | View raw


On 7 January 2013 22:10, Victor Hooi <victorhooi@gmail.com> wrote:
> Hi,
>
> I'm trying to compare two logfiles in Python.
>
> One logfile will have lines recording the message being sent:
>
>     05:00:06 Message sent - Value A: 5.6, Value B: 6.2, Value C: 9.9
>
> the other logfile has line recording the message being received
>
>     05:00:09 Message received - Value A: 5.6, Value B: 6.2, Value C: 9.9
>
> The goal is to compare the time stamp between the two - we can safely assume the timestamp on the message being received is later than the timestamp on transmission.
>
> If it was a direct line-by-line, I could probably use itertools.izip(), right?
>
> However, it's not a direct line-by-line comparison of the two files - the lines I'm looking for are interspersed among other loglines, and the time difference between sending/receiving is quite variable.
>
> So the idea is to iterate through the sending logfile - then iterate through the receiving logfile from that timestamp forwards, looking for the matching pair. Obviously I want to minimise the amount of back-forth through the file.
>
> Also, there is a chance that certain messages could get lost - so I assume there's a threshold after which I want to give up searching for the matching received message, and then just try to resync to the next sent message.
>
> Is there a Pythonic way, or some kind of idiom that I can use to approach this problem?

Assuming that you can impose a maximum time between the send and
recieve timestamps, something like the following might work
(untested):

def find_matching(logfile1, logfile2, maxdelta):
    buf = {}
    logfile2 = iter(logfile2)
    for msg1 in logfile1:
        if msg1.key in buf:
            yield msg1, buf.pop(msg1.key)
            continue
        maxtime = msg1.time + maxdelta
        for msg2 in logfile2:
            if msg2.key == msg1.key:
                yield msg1, msg2
                break
            buf[msg2.key] = msg2
            if msg2.time > maxtime:
                break
        else:
            yield msg1, 'No match'


Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Searching through two logfiles in parallel? Victor Hooi <victorhooi@gmail.com> - 2013-01-07 14:10 -0800
  Re: Searching through two logfiles in parallel? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-07 22:58 +0000
    Re: Searching through two logfiles in parallel? Victor Hooi <victorhooi@gmail.com> - 2013-01-07 15:41 -0800
      Re: Searching through two logfiles in parallel? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-08 00:33 +0000
    Re: Searching through two logfiles in parallel? Victor Hooi <victorhooi@gmail.com> - 2013-01-07 15:41 -0800
  Re: Searching through two logfiles in parallel? darnold <darnold992000@yahoo.com> - 2013-01-08 11:16 -0800
    Re: Searching through two logfiles in parallel? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-08 23:40 +0000

csiph-web