Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python.': 0.02; 'subject:help': 0.07; 'backwards': 0.09; 'lines:': 0.09; 'specifying': 0.09; 'subject:using': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'subject:python': 0.11; 'files.': 0.13; 'file,': 0.15; '"from': 0.16; '-1:': 0.16; 'easier.': 0.16; 'guessing': 0.16; 'iteration,': 0.16; 'wrote:': 0.17; 'specify': 0.17; 'input': 0.18; 'trying': 0.21; 'bit': 0.21; 'not,': 0.21; 'assumes': 0.22; 'default,': 0.22; 'occurs': 0.22; 'cc:2**0': 0.23; 'example': 0.23; 'proprietary': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'appreciated.': 0.26; 'am,': 0.27; 'coding': 0.27; 'possibility': 0.27; 'lines': 0.28; 'skip:( 20': 0.28; "i'm": 0.29; 'code': 0.31; '(and': 0.32; 'file': 0.32; 'says': 0.33; 'extract': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'text': 0.34; 'loss': 0.34; 'thanks': 0.34; 'received:209.85': 0.35; 'there': 0.35; 'next': 0.35; 'but': 0.36; 'does': 0.37; 'received:209': 0.37; 'far': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'mean': 0.38; 'skip:o 20': 0.38; 'some': 0.38; 'description': 0.39; 'where': 0.40; 'help': 0.40; 'end': 0.40; 'your': 0.60; 'from:no real name:2**0': 0.60; 'save': 0.61; 'subject:Need': 0.61; 'first': 0.61; 'between': 0.63; 'times': 0.63; 'email addr:gmail.com': 0.63; 'behavior': 0.64; 'dont': 0.64; 'total': 0.65; 'subject': 0.66; 'forth': 0.75; '2013': 0.84; 'apart.': 0.84; 'angel': 0.93 X-Received: by 10.50.151.205 with SMTP id us13mr78149igb.2.1363760094946; Tue, 19 Mar 2013 23:14:54 -0700 (PDT) Newsgroups: comp.lang.python Date: Tue, 19 Mar 2013 23:14:54 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=199.63.142.252; posting-account=GeTYGQoAAABfBC2zuW1DVIGJ9smkbUcQ References: User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 199.63.142.252 MIME-Version: 1.0 Subject: Re: Need help in extracting lines from word using python From: razinzamada@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 147 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1363760721 news.xs4all.nl 6893 [2001:888:2000:d::a6]:57142 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:41571 Thanks DAVE On Tuesday, March 19, 2013 8:24:24 PM UTC+5:30, Dave Angel wrote: > On 03/19/2013 10:20 AM, razinzamada@gmail.com wrote: >=20 > > I'm currently trying to extract some data between 2 lines of an input f= ile >=20 >=20 >=20 > Your subject line says "from word". I'm only guessing that you might=20 >=20 > mean Microsoft Word, a proprietary program that does not, by default,=20 >=20 > save text files. The following code and description assumes a text=20 >=20 > file, so there's a contradiction. >=20 >=20 >=20 >=20 >=20 > > using Python. the infile is set up such that there is a line -START- wh= ere I need the next 10 lines of code if and only if the -END- condition occ= urs before the next -START-. The -START- line occurs many times before the = -END-. Heres a general example of what I mean: >=20 > > >=20 >=20 >=20 > In other words, you want to scan for -END-, then go backwards to -START-= =20 >=20 > and use the first ten of the lines between? Try coding it that way, and= =20 >=20 > perhaps it'll be easier. >=20 >=20 >=20 > You also need to consider (and specify behavior for) the possibility=20 >=20 > that start and end are less than 10 lines apart. >=20 >=20 >=20 > > blah >=20 > > blah >=20 > > -START- >=20 > > 10 lines I DONT need >=20 > > blah >=20 > > -START- >=20 > > 10 lines I need >=20 > > blah >=20 > > blah >=20 > > -END- >=20 > > blah >=20 > > blah >=20 > > -START- >=20 > > 10 lines I dont need >=20 > > blah >=20 > > -START- >=20 > > >=20 > > .... and so on and so forth >=20 > > >=20 > > so far I have only been able to get the -START- + 10 lines for every it= eration, but am at a total loss when it comes to specifying the condition t= o only write if the -END- condition comes before another -START- condition.= I'm a bit of a newb, so any help will be greatly appreciated. >=20 > > >=20 > > >=20 > > heres the code I have for printing the -START- + 10 lines: >=20 > > >=20 > > in =3D open('input.log') >=20 > > out =3D open('output.txt', 'a') >=20 > > >=20 > > lines =3D in.readlines() >=20 > > for i, line in enumerate(lines): >=20 > > if (line.find('START')) > -1: >=20 > > out.write(line) >=20 > > out.write(lines[i + 1]) >=20 > > out.write(lines[i + 2]) >=20 > > out.write(lines[i + 3]) >=20 > > out.write(lines[i + 4]) >=20 > > out.write(lines[i + 5]) >=20 > > out.write(lines[i + 6]) >=20 > > out.write(lines[i + 7]) >=20 > > out.write(lines[i + 8]) >=20 > > out.write(lines[i + 9]) >=20 > > out.write(lines[i + 10]) >=20 >=20 >=20 > or just out.write(lines[i:i+11) to write out all 11 of th= em. >=20 > > >=20 >=20 >=20 >=20 >=20 > --=20 >=20 > DaveA