Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #27975

Re: Extract Text Table From File

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!eweka.nl!hq-usenetpeers.eweka.nl!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <gandalf@shopzeus.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.015
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'elif': 0.04; 'none:': 0.05; 'sql.': 0.07; 'iterate': 0.09; 'res': 0.09; 'skip:r 50': 0.09; 'anyway': 0.11; '0.0,': 0.16; '204': 0.16; '37,': 0.16; 'block:': 0.16; 'blocks': 0.16; 'res:': 0.16; 'subject:File': 0.16; 'subject:Text': 0.16; 'headers': 0.17; 'import': 0.21; 'sets': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'common': 0.26; 'values': 0.26; '(e.g.': 0.27; 'lines': 0.28; 'rest': 0.28; '204,': 0.29; 'file': 0.32; 'print': 0.32; 'certain': 0.33; 'extract': 0.33; 'text,': 0.33; 'values.': 0.33; 'to:addr:python-list': 0.33; 'hi,': 0.33; 'skip:b 20': 0.34; 'saved': 0.35; 'table': 0.35; 'thank': 0.36; 'skip:p 20': 0.36; 'possible': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'skip:l 20': 0.38; 'skip:o 20': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'identify': 0.61; 'first': 0.61; 'time,': 0.62; 'different': 0.63; 'here': 0.65; 'date,': 0.65; 'below.': 0.68; 'received:204': 0.72; '144': 0.84; 'fin': 0.84; '6.4': 0.91; 'subject:From': 0.97
Date Mon, 27 Aug 2012 13:07:37 +0200
From Laszlo Nagy <gandalf@shopzeus.com>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version 1.0
To python-list@python.org
Subject Re: Extract Text Table From File
References <481dc39d-1dee-4ebe-97d5-ccad659f8c74@googlegroups.com> <mailman.3865.1346062345.4697.python-list@python.org> <1a0794ef-2186-4fae-a0be-61e07eb4e9e6@googlegroups.com>
In-Reply-To <1a0794ef-2186-4fae-a0be-61e07eb4e9e6@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3868.1346065662.4697.python-list@python.org> (permalink)
Lines 56
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1346065662 news.xs4all.nl 6965 [2001:888:2000:d::a6]:54692
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:27975

Show key headers only | View raw


> Hi,
>
> Thank you for the information.
> The exact way I want to extract the data is like as below.
>
> TRG, MP and DATE and TIME is common for that certain block of traffic.
> So I am using those and dumping it with the rest of the data into sql.
> Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW).
>
> So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144.
How many blocks do you have in a file? Do you want to create different 
data sets for those blocks? How do you identify those blocks? (E.g. are 
they all saved into the same database table the same way?)

Anyway here is something:

import re
# AABBCCO     6.4     204     0.0   115    1.0    113.4     144
pattern = re.compile(r"""([A-Z]{7})"""+7*r"""\s+([\d\.]+)""")

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
blocks = []
block = None
for line in fin:
     # This is one possible way to extract values.
     values = line.strip().split()
     if values==['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 
'MHTIME', 'NBANSW']:
         if block is not None:
             blocks.append(block)
         block = []
     elif block is not None:
         res = pattern.match(line.strip())
         if res:
             values = list(res.groups())
             values[1:] = map(float,values[1:])
             block.append(values)
if block is not None:
     blocks.append(block)

for idx,block in enumerate(blocks):
     print "BLOCK",idx
     for values in block:
         print values

This prints:

BLOCK 0
['AABBCCO', 6.4, 204.0, 0.0, 115.0, 1.0, 113.4, 144.0]
['DDEEFFO', 0.2, 5.0, 0.0, 59.0, 0.0, 107.6, 3.0]
['HHGGFFO', 0.0, 0.0, 0.0, 30.0, 0.0, 0.0, 0.0]

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 02:53 -0700
  Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 12:12 +0200
    Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 03:34 -0700
    Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 03:34 -0700
      Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 13:07 +0200
    Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 04:23 -0700
    Re: Extract Text Table From File Huso <hussain.a.rasheed@gmail.com> - 2012-08-27 04:23 -0700
      Re: Extract Text Table From File Laszlo Nagy <gandalf@shopzeus.com> - 2012-08-27 13:55 +0200
    Re: Extract Text Table From File Ramchandra Apte <maniandram01@gmail.com> - 2012-09-05 06:08 -0700
    Re: Extract Text Table From File Tim Chase <python.list@tim.thechases.com> - 2012-09-05 11:25 -0500
  Re: Extract Text Table From File Tim Chase <python.list@tim.thechases.com> - 2012-08-27 07:42 -0500

csiph-web