Groups > comp.lang.python > #27970 > unrolled thread

Extract Text Table From File

Started by	Huso <hussain.a.rasheed@gmail.com>
First post	2012-08-27 02:53 -0700
Last post	2012-08-27 07:42 -0500
Articles	11 — 4 participants

Back to article view | Back to comp.lang.python

  Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 02:53 -0700
    Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 12:12 +0200
      Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 03:34 -0700
      Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 03:34 -0700
        Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 13:07 +0200
      Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 04:23 -0700
      Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 04:23 -0700
        Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 13:55 +0200
      Re: Extract Text Table From File Ramchandra Apte <maniandram01@gmail.com> - 2012-09-05 06:08 -0700
      Re: Extract Text Table From File Tim Chase <python.list@tim.thechases.com> - 2012-09-05 11:25 -0500
    Re: Extract Text Table From File Tim Chase <python.list@tim.thechases.com> - 2012-08-27 07:42 -0500

#27970 — Extract Text Table From File

From	Huso <hussain.a.rasheed@gmail.com>
Date	2012-08-27 02:53 -0700
Subject	Extract Text Table From File
Message-ID	<481dc39d-1dee-4ebe-97d5-ccad659f8c74@googlegroups.com>

Hi,

I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

Below is just ONE block of the traffic i have in the log files. There will be more in them with different data.

ROUTES TRAFFIC RESULTS, LSR
TRG  MP   DATE   TIME
 37  17 120824   0000

R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
AABBCCO     6.4     204     0.0   115    1.0    113.4     144
AABBCCI     3.0     293           115    1.0     37.0     171
DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
EEFFEEI     0.0       0            59    0.0      0.0       0
HHGGFFO     0.0       0     0.0    30    0.0      0.0       0
HHGGFFI     0.3      15            30    0.0     62.2       4
END

Thanks

[toc] | [next] | [standalone]

#27971

From	Laszlo Nagy <gandalf@shopzeus.com>
Date	2012-08-27 12:12 +0200
Message-ID	<mailman.3865.1346062345.4697.python-list@python.org>
In reply to	#27970

On 2012-08-27 11:53, Huso wrote:
> Hi,
>
> I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.

#
# Write test data to test.txt
#

data = """
ROUTES TRAFFIC RESULTS, LSR
TRG  MP   DATE   TIME
  37  17 120824   0000

R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
AABBCCO     6.4     204     0.0   115    1.0    113.4     144
AABBCCI     3.0     293           115    1.0     37.0     171
DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
EEFFEEI     0.0       0            59    0.0      0.0       0
HHGGFFO     0.0       0     0.0    30    0.0      0.0       0
HHGGFFI     0.3      15            30    0.0     62.2       4
END
"""
fout = open("test.txt","wb+")
fout.write(data)
fout.close()

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
for line in fin:
     # This is one possible way to extract values.
     values = line.strip().split()
     print values


This will print:

[]
['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
['TRG', 'MP', 'DATE', 'TIME']
['37', '17', '120824', '0000']
[]
['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
['END']


The "values" list in the last line contains these values. This will work 
only if you don't have spaces in your values. Otherwise you can use 
regular expressions to parse a line. See here:

http://docs.python.org/library/re.html

Since you did not give any specification on your file format, it would 
be hard to give a concrete program that parses your file(s)

Best,

     Laszlo

[toc] | [prev] | [next] | [standalone]

#27972

From	Huso <hussain.a.rasheed@gmail.com>
Date	2012-08-27 03:34 -0700
Message-ID	<mailman.3866.1346063653.4697.python-list@python.org>
In reply to	#27971

On Monday, August 27, 2012 3:12:14 PM UTC+5, Laszlo Nagy wrote:
> On 2012-08-27 11:53, Huso wrote:
> 
> > Hi,
> 
> >
> 
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.
> 
> 
> 
> #
> 
> # Write test data to test.txt
> 
> #
> 
> 
> 
> data = """
> 
> ROUTES TRAFFIC RESULTS, LSR
> 
> TRG  MP   DATE   TIME
> 
>   37  17 120824   0000
> 
> 
> 
> R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
> 
> AABBCCO     6.4     204     0.0   115    1.0    113.4     144
> 
> AABBCCI     3.0     293           115    1.0     37.0     171
> 
> DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
> 
> EEFFEEI     0.0       0            59    0.0      0.0       0
> 
> HHGGFFO     0.0       0     0.0    30    0.0      0.0       0
> 
> HHGGFFI     0.3      15            30    0.0     62.2       4
> 
> END
> 
> """
> 
> fout = open("test.txt","wb+")
> 
> fout.write(data)
> 
> fout.close()
> 
> 
> 
> #
> 
> # This is how you iterate over a file and process its lines
> 
> #
> 
> fin = open("test.txt","r")
> 
> for line in fin:
> 
>      # This is one possible way to extract values.
> 
>      values = line.strip().split()
> 
>      print values
> 
> 
> 
> 
> 
> This will print:
> 
> 
> 
> []
> 
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
> 
> ['TRG', 'MP', 'DATE', 'TIME']
> 
> ['37', '17', '120824', '0000']
> 
> []
> 
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
> 
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
> 
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
> 
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
> 
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
> 
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
> 
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
> 
> ['END']
> 
> 
> 
> 
> 
> The "values" list in the last line contains these values. This will work 
> 
> only if you don't have spaces in your values. Otherwise you can use 
> 
> regular expressions to parse a line. See here:
> 
> 
> 
> http://docs.python.org/library/re.html
> 
> 
> 
> Since you did not give any specification on your file format, it would 
> 
> be hard to give a concrete program that parses your file(s)
> 
> 
> 
> Best,
> 
> 
> 
>      Laszlo

Hi,

Thank you for the information.
The exact way I want to extract the data is like as below.

TRG, MP and DATE and TIME is common for that certain block of traffic.
So I am using those and dumping it with the rest of the data into sql.
Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).

So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.

Thanking,
Huso

[toc] | [prev] | [next] | [standalone]

#27973

From	Huso <hussain.a.rasheed@gmail.com>
Date	2012-08-27 03:34 -0700
Message-ID	<1a0794ef-2186-4fae-a0be-61e07eb4e9e6@googlegroups.com>
In reply to	#27971

On Monday, August 27, 2012 3:12:14 PM UTC+5, Laszlo Nagy wrote:
> On 2012-08-27 11:53, Huso wrote:
> 
> > Hi,
> 
> >
> 
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.
> 
> 
> 
> #
> 
> # Write test data to test.txt
> 
> #
> 
> 
> 
> data = """
> 
> ROUTES TRAFFIC RESULTS, LSR
> 
> TRG  MP   DATE   TIME
> 
>   37  17 120824   0000
> 
> 
> 
> R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
> 
> AABBCCO     6.4     204     0.0   115    1.0    113.4     144
> 
> AABBCCI     3.0     293           115    1.0     37.0     171
> 
> DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
> 
> EEFFEEI     0.0       0            59    0.0      0.0       0
> 
> HHGGFFO     0.0       0     0.0    30    0.0      0.0       0
> 
> HHGGFFI     0.3      15            30    0.0     62.2       4
> 
> END
> 
> """
> 
> fout = open("test.txt","wb+")
> 
> fout.write(data)
> 
> fout.close()
> 
> 
> 
> #
> 
> # This is how you iterate over a file and process its lines
> 
> #
> 
> fin = open("test.txt","r")
> 
> for line in fin:
> 
>      # This is one possible way to extract values.
> 
>      values = line.strip().split()
> 
>      print values
> 
> 
> 
> 
> 
> This will print:
> 
> 
> 
> []
> 
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
> 
> ['TRG', 'MP', 'DATE', 'TIME']
> 
> ['37', '17', '120824', '0000']
> 
> []
> 
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
> 
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
> 
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
> 
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
> 
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
> 
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
> 
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
> 
> ['END']
> 
> 
> 
> 
> 
> The "values" list in the last line contains these values. This will work 
> 
> only if you don't have spaces in your values. Otherwise you can use 
> 
> regular expressions to parse a line. See here:
> 
> 
> 
> http://docs.python.org/library/re.html
> 
> 
> 
> Since you did not give any specification on your file format, it would 
> 
> be hard to give a concrete program that parses your file(s)
> 
> 
> 
> Best,
> 
> 
> 
>      Laszlo

Hi,

Thank you for the information.
The exact way I want to extract the data is like as below.

TRG, MP and DATE and TIME is common for that certain block of traffic.
So I am using those and dumping it with the rest of the data into sql.
Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).

So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.

Thanking,
Huso

[toc] | [prev] | [next] | [standalone]

#27975

From	Laszlo Nagy <gandalf@shopzeus.com>
Date	2012-08-27 13:07 +0200
Message-ID	<mailman.3868.1346065662.4697.python-list@python.org>
In reply to	#27973

> Hi,
>
> Thank you for the information.
> The exact way I want to extract the data is like as below.
>
> TRG, MP and DATE and TIME is common for that certain block of traffic.
> So I am using those and dumping it with the rest of the data into sql.
> Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).
>
> So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.
How many blocks do you have in a file? Do you want to create different 
data sets for those blocks? How do you identify those blocks? (E.g. are 
they all saved into the same database table the same way?)

Anyway here is something:

import re
# AABBCCO     6.4     204     0.0   115    1.0    113.4     144
pattern = re.compile(r"""([A-Z]{7})"""+7*r"""\s+([\d\.]+)""")

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
blocks = []
block = None
for line in fin:
     # This is one possible way to extract values.
     values = line.strip().split()
     if values==['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 
'MHTIME', 'NBANSW']:
         if block is not None:
             blocks.append(block)
         block = []
     elif block is not None:
         res = pattern.match(line.strip())
         if res:
             values = list(res.groups())
             values[1:] = map(float,values[1:])
             block.append(values)
if block is not None:
     blocks.append(block)

for idx,block in enumerate(blocks):
     print "BLOCK",idx
     for values in block:
         print values

This prints:

BLOCK 0
['AABBCCO', 6.4, 204.0, 0.0, 115.0, 1.0, 113.4, 144.0]
['DDEEFFO', 0.2, 5.0, 0.0, 59.0, 0.0, 107.6, 3.0]
['HHGGFFO', 0.0, 0.0, 0.0, 30.0, 0.0, 0.0, 0.0]

[toc] | [prev] | [next] | [standalone]

#27976

From	Huso <hussain.a.rasheed@gmail.com>
Date	2012-08-27 04:23 -0700
Message-ID	<mailman.3869.1346066590.4697.python-list@python.org>
In reply to	#27971

Hi,

There can be any number of blocks in the log file.
I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.

I tried the code you mentioned, it works for the data part.
But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.

Thanking,
Huso

[toc] | [prev] | [next] | [standalone]

#27977

From	Huso <hussain.a.rasheed@gmail.com>
Date	2012-08-27 04:23 -0700
Message-ID	<8e48635a-f873-40ce-b886-2ffc058f9eb4@googlegroups.com>
In reply to	#27971

Hi,

There can be any number of blocks in the log file.
I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.

I tried the code you mentioned, it works for the data part.
But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.

Thanking,
Huso

[toc] | [prev] | [next] | [standalone]

#27979

From	Laszlo Nagy <gandalf@shopzeus.com>
Date	2012-08-27 13:55 +0200
Message-ID	<mailman.3871.1346068558.4697.python-list@python.org>
In reply to	#27977

On 2012-08-27 13:23, Huso wrote:
> Hi,
>
> There can be any number of blocks in the log file.
> I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and ending in 'END'. Each block will have a unique [date + time] value.
>
> I tried the code you mentioned, it works for the data part.
> But I need to get the TRG, MP, DATE and TIME for the block with those data as well. This is the part that i'm really tangled in.
>
> Thanking,
> Huso
Well, I suggest that you try to understand my code and make changes in 
it. It is not too hard. First you start reading documentation of the 
"re" module. It is worth learning Python. Especially for mining data out 
of text files. :-)

Best,

    Laszlo

[toc] | [prev] | [next] | [standalone]

#28490

From	Ramchandra Apte <maniandram01@gmail.com>
Date	2012-09-05 06:08 -0700
Message-ID	<mailman.233.1346850488.27098.python-list@python.org>
In reply to	#27971

On Monday, 27 August 2012 15:42:14 UTC+5:30, Laszlo Nagy  wrote:
> On 2012-08-27 11:53, Huso wrote:
> 
> > Hi,
> 
> >
> 
> > I am trying to extract some text table data from a log file. I am trying different methods, but I don't seem to get anything to work. I am kind of new to python as well. Hence, appreciate if someone could help me out.
> 
> 
> 
> #
> 
> # Write test data to test.txt
> 
> #
> 
> 
> 
> data = """
> 
> ROUTES TRAFFIC RESULTS, LSR
> 
> TRG  MP   DATE   TIME
> 
>   37  17 120824   0000
> 
> 
> 
> R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
> 
> AABBCCO     6.4     204     0.0   115    1.0    113.4     144
> 
> AABBCCI     3.0     293           115    1.0     37.0     171
> 
> DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
> 
> EEFFEEI     0.0       0            59    0.0      0.0       0
> 
> HHGGFFO     0.0       0     0.0    30    0.0      0.0       0
> 
> HHGGFFI     0.3      15            30    0.0     62.2       4
> 
> END
> 
> """
> 
> fout = open("test.txt","wb+")
> 
> fout.write(data)
> 
> fout.close()
> 
> 
> 
> #
> 
> # This is how you iterate over a file and process its lines
> 
> #
> 
> fin = open("test.txt","r")
> 
> for line in fin:
> 
>      # This is one possible way to extract values.
> 
>      values = line.strip().split()
> 
>      print values
> 
> 
> 
> 
> 
> This will print:
> 
> 
> 
> []
> 
> ['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
> 
> ['TRG', 'MP', 'DATE', 'TIME']
> 
> ['37', '17', '120824', '0000']
> 
> []
> 
> ['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
> 
> ['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
> 
> ['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
> 
> ['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
> 
> ['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
> 
> ['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
> 
> ['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
> 
> ['END']
> 
> 
> 
> 
> 
> The "values" list in the last line contains these values. This will work 
> 
> only if you don't have spaces in your values. Otherwise you can use 
> 
> regular expressions to parse a line. See here:
> 
> 
> 
> http://docs.python.org/library/re.html
> 
> 
> 
the csv module should be used for this not regex
> Since you did not give any specification on your file format, it would 
> 
> be hard to give a concrete program that parses your file(s)
> 
> 
> 
> Best,
> 
> 
> 
>      Laszlo

[toc] | [prev] | [next] | [standalone]

#28520

From	Tim Chase <python.list@tim.thechases.com>
Date	2012-09-05 11:25 -0500
Message-ID	<mailman.252.1346862235.27098.python-list@python.org>
In reply to	#27971

[trimming out a bunch of superfluous text so the thread is actually
readable]

On 09/05/12 08:08, Ramchandra Apte wrote:
> On Monday, 27 August 2012 15:42:14 UTC+5:30, Laszlo Nagy  wrote:
>> On 2012-08-27 11:53, Huso wrote:
>>> I am trying to extract some text table data from a log file
>>
>> fin = open("test.txt","r")
>>
>> for line in fin:
>>
>>      # This is one possible way to extract values.
>>
>>      values = line.strip().split()
>>
>>      print values
>
> the csv module should be used for this not regex

The problem is that the csv module expects a single delimiter
character, not columnar data.

-tkc

[toc] | [prev] | [next] | [standalone]

#27982

From	Tim Chase <python.list@tim.thechases.com>
Date	2012-08-27 07:42 -0500
Message-ID	<mailman.3874.1346071301.4697.python-list@python.org>
In reply to	#27970

On 08/27/12 04:53, Huso wrote:
> Below is just ONE block of the traffic i have in the log files. There will be more in them with different data.
> 
> ROUTES TRAFFIC RESULTS, LSR
> TRG  MP   DATE   TIME
>  37  17 120824   0000
> 
> R         TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
> AABBCCO     6.4     204     0.0   115    1.0    113.4     144
> AABBCCI     3.0     293           115    1.0     37.0     171
> DDEEFFO     0.2       5     0.0    59    0.0    107.6       3
> HHGGFFI     0.3      15            30    0.0     62.2       4
> END

In the past I've used something like the following to find columnar
data based on some found headers:

  import re
  token_re = re.compile(r'\b(\w+)\s*')
  f = file(FILENAME)
  headers = f.next() # in your case, you'd
                     # search forward until
                     # you got to a header line
                     # and use that TRAFF... line
  header_map = dict(
    # build a map of field-name to slice
    (
      matchobj.group(1).upper(),
      slice(*matchobj.span())
    )
    for matchobj
    in token_re.finditer(headers)
    )

You can then access your values as you iterate through the rest of
the rows:

  for row in f:
    if row.startswith("END"): break
    traff = float(row[header_map["TRAFF"]])
    # ...

which makes the code pretty easy to read, effectively turning it
into a CSV file.

It has the advantage that, if for some reason data in the columns
have spaces in them, it won't throw off the row as a .split() would.

-tkc

[toc] | [prev] | [standalone]

csiph-web

Extract Text Table From File

Contents

#27970 — Extract Text Table From File

#27971

#27972

#27973

#27975

#27976

#27977

#27979

#28490

#28520

#27982