Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Mon, 27 Aug 2012 07:42:52 -0500
From: Tim Chase <python.list@tim.thechases.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111120 Icedove/3.1.16
MIME-Version: 1.0
To: Huso <hussain.a.rasheed@gmail.com>
Subject: Re: Extract Text Table From File
References: <481dc39d-1dee-4ebe-97d5-ccad659f8c74@googlegroups.com>
In-Reply-To: <481dc39d-1dee-4ebe-97d5-ccad659f8c74@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3874.1346071301.4697.python-list@python.org>
Lines: 52
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27982

On 08/27/12 04:53, Huso wrote:
> Below is just ONE block of the traffic i have in the log files. There will be more in them with different data.
> 
> ROUTES TRAFFIC RESULTS, LSR
> TRG  MP   DATE   TIME
>  37  17 120824   0000
> 
> R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
> AABBCCO     6.4     204     0.0   115    1.0    113.4     144
> AABBCCI     3.0     293           115    1.0     37.0     171
> DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
> HHGGFFI     0.3      15            30    0.0     62.2       4
> END

In the past I've used something like the following to find columnar
data based on some found headers:

  import re
  token_re = re.compile(r'\b(\w+)\s*')
  f = file(FILENAME)
  headers = f.next() # in your case, you'd
                     # search forward until
                     # you got to a header line
                     # and use that TRAFF... line
  header_map = dict(
    # build a map of field-name to slice
    (
      matchobj.group(1).upper(),
      slice(*matchobj.span())
    )
    for matchobj
    in token_re.finditer(headers)
    )

You can then access your values as you iterate through the rest of
the rows:

  for row in f:
    if row.startswith("END"): break
    traff = float(row[header_map["TRAFF"]])
    # ...

which makes the code pretty easy to read, effectively turning it
into a CSV file.

It has the advantage that, if for some reason data in the columns
have spaces in them, it won't throw off the row as a .split() would.

-tkc