Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!eweka.nl!hq-usenetpeers.eweka.nl!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.015 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'elif': 0.04; 'none:': 0.05; 'sql.': 0.07; 'iterate': 0.09; 'res': 0.09; 'skip:r 50': 0.09; 'anyway': 0.11; '0.0,': 0.16; '204': 0.16; '37,': 0.16; 'block:': 0.16; 'blocks': 0.16; 'res:': 0.16; 'subject:File': 0.16; 'subject:Text': 0.16; 'headers': 0.17; 'import': 0.21; 'sets': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'common': 0.26; 'values': 0.26; '(e.g.': 0.27; 'lines': 0.28; 'rest': 0.28; '204,': 0.29; 'file': 0.32; 'print': 0.32; 'certain': 0.33; 'extract': 0.33; 'text,': 0.33; 'values.': 0.33; 'to:addr:python-list': 0.33; 'hi,': 0.33; 'skip:b 20': 0.34; 'saved': 0.35; 'table': 0.35; 'thank': 0.36; 'skip:p 20': 0.36; 'possible': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'skip:l 20': 0.38; 'skip:o 20': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'identify': 0.61; 'first': 0.61; 'time,': 0.62; 'different': 0.63; 'here': 0.65; 'date,': 0.65; 'below.': 0.68; 'received:204': 0.72; '144': 0.84; 'fin': 0.84; '6.4': 0.91; 'subject:From': 0.97 Date: Mon, 27 Aug 2012 13:07:37 +0200 From: Laszlo Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Extract Text Table From File References: <481dc39d-1dee-4ebe-97d5-ccad659f8c74@googlegroups.com> <1a0794ef-2186-4fae-a0be-61e07eb4e9e6@googlegroups.com> In-Reply-To: <1a0794ef-2186-4fae-a0be-61e07eb4e9e6@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346065662 news.xs4all.nl 6965 [2001:888:2000:d::a6]:54692 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27975 > Hi, > > Thank you for the information. > The exact way I want to extract the data is like as below. > > TRG, MP and DATE and TIME is common for that certain block of traffic. > So I am using those and dumping it with the rest of the data into sql. > Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, ANBLO, MHTIME, NBANSW). > > So from this text, the first data will be 37, 17, 120824, 0000, AABBCCO, 6.4, 204, 0.0, 115, 1.0, 113.4, 144. How many blocks do you have in a file? Do you want to create different data sets for those blocks? How do you identify those blocks? (E.g. are they all saved into the same database table the same way?) Anyway here is something: import re # AABBCCO 6.4 204 0.0 115 1.0 113.4 144 pattern = re.compile(r"""([A-Z]{7})"""+7*r"""\s+([\d\.]+)""") # # This is how you iterate over a file and process its lines # fin = open("test.txt","r") blocks = [] block = None for line in fin: # This is one possible way to extract values. values = line.strip().split() if values==['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']: if block is not None: blocks.append(block) block = [] elif block is not None: res = pattern.match(line.strip()) if res: values = list(res.groups()) values[1:] = map(float,values[1:]) block.append(values) if block is not None: blocks.append(block) for idx,block in enumerate(blocks): print "BLOCK",idx for values in block: print values This prints: BLOCK 0 ['AABBCCO', 6.4, 204.0, 0.0, 115.0, 1.0, 113.4, 144.0] ['DDEEFFO', 0.2, 5.0, 0.0, 59.0, 0.0, 107.6, 3.0] ['HHGGFFO', 0.0, 0.0, 0.0, 30.0, 0.0, 0.0, 0.0]