Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #41505 > unrolled thread

Need help in extracting lines from word using python

Started byrazinzamada@gmail.com
First post2013-03-19 07:20 -0700
Last post2013-03-19 23:14 -0700
Articles 6 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Need help in extracting lines from word using python razinzamada@gmail.com - 2013-03-19 07:20 -0700
    Re: Need help in extracting lines from word using python Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-19 14:41 +0000
      Re: Need help in extracting lines from word using python razinzamada@gmail.com - 2013-03-19 23:13 -0700
    Re: Need help in extracting lines from word using python Dave Angel <davea@davea.name> - 2013-03-19 10:54 -0400
      Re: Need help in extracting lines from word using python razinzamada@gmail.com - 2013-03-19 23:14 -0700
      Re: Need help in extracting lines from word using python razinzamada@gmail.com - 2013-03-19 23:14 -0700

#41505 — Need help in extracting lines from word using python

Fromrazinzamada@gmail.com
Date2013-03-19 07:20 -0700
SubjectNeed help in extracting lines from word using python
Message-ID<f9e61b45-759f-4286-a639-9fb826ec5597@googlegroups.com>
I'm currently trying to extract some data between 2 lines of an input file using Python. the infile is set up such that there is a line -START- where I need the next 10 lines of code if and only if the -END- condition occurs before the next -START-. The -START- line occurs many times before the -END-. Heres a general example of what I mean:

blah
blah
-START-
10 lines I DONT need
blah
-START-
10 lines I need
blah
blah
-END-
blah
blah
-START-
10 lines I dont need
blah
-START-

.... and so on and so forth

so far I have only been able to get the -START- + 10 lines for every iteration, but am at a total loss when it comes to specifying the condition to only write if the -END- condition comes before another -START- condition. I'm a bit of a newb, so any help will be greatly appreciated.


heres the code I have for printing the -START- + 10 lines:

    in = open('input.log')
    out = open('output.txt', 'a')

    lines = in.readlines()
        for i, line in enumerate(lines):
            if (line.find('START')) > -1:
                out.write(line)
                out.write(lines[i + 1])
                out.write(lines[i + 2])
                out.write(lines[i + 3])
                out.write(lines[i + 4])
                out.write(lines[i + 5])
                out.write(lines[i + 6])
                out.write(lines[i + 7])
                out.write(lines[i + 8])
                out.write(lines[i + 9])
                out.write(lines[i + 10])

[toc] | [next] | [standalone]


#41509

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-03-19 14:41 +0000
Message-ID<51487911$0$6599$c3e8da3$5496439d@news.astraweb.com>
In reply to#41505
On Tue, 19 Mar 2013 07:20:57 -0700, razinzamada wrote:

> I'm currently trying to extract some data between 2 lines of an input
> file using Python. the infile is set up such that there is a line
> -START- where I need the next 10 lines of code if and only if the -END-
> condition occurs before the next -START-. The -START- line occurs many
> times before the -END-. Heres a general example of what I mean:
> 
> blah
> blah
> -START-
> 10 lines I DONT need
> blah
> -START-
> 10 lines I need
> blah
> blah
> -END-
> blah
> blah
> -START-
> 10 lines I dont need
> blah
> -START-
> 
> .... and so on and so forth

[...]

> heres the code I have for printing the -START- + 10 lines:
> 
>     in = open('input.log')

No it is not. "in" is a reserved word in Python, that code cannot 
possibly work, it will give a SyntaxError.


Try this code. Untested but it should do want you want.


infile = open('input.log')
outfile = open('output.txt', 'a')
# Accumulate lines between START and END lines, ignoring everything else.
collect = False  # Initially we start by ignoring lines.
for line in infile:
    if '-START-' in line:
        # Ignore any lines already seen, and start collecting.
        accum = []
        collect = True
    elif '-END-' in line:
        # Write the first ten accumulated lines.
        outfile.writelines(accum[:10])
        # Clear the accumulated lines.
        accum = []
        # and stop collecting until the next START line
        collect = False
    elif collect:
        accum.append(line)

outfile.close()
infile.close()



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#41569

Fromrazinzamada@gmail.com
Date2013-03-19 23:13 -0700
Message-ID<73c06136-249a-41fd-862f-c454362fbc62@googlegroups.com>
In reply to#41509
Thanks steven

On Tuesday, March 19, 2013 8:11:22 PM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 19 Mar 2013 07:20:57 -0700, razinzamada wrote:
> 
> 
> 
> > I'm currently trying to extract some data between 2 lines of an input
> 
> > file using Python. the infile is set up such that there is a line
> 
> > -START- where I need the next 10 lines of code if and only if the -END-
> 
> > condition occurs before the next -START-. The -START- line occurs many
> 
> > times before the -END-. Heres a general example of what I mean:
> 
> > 
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I DONT need
> 
> > blah
> 
> > -START-
> 
> > 10 lines I need
> 
> > blah
> 
> > blah
> 
> > -END-
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I dont need
> 
> > blah
> 
> > -START-
> 
> > 
> 
> > .... and so on and so forth
> 
> 
> 
> [...]
> 
> 
> 
> > heres the code I have for printing the -START- + 10 lines:
> 
> > 
> 
> >     in = open('input.log')
> 
> 
> 
> No it is not. "in" is a reserved word in Python, that code cannot 
> 
> possibly work, it will give a SyntaxError.
> 
> 
> 
> 
> 
> Try this code. Untested but it should do want you want.
> 
> 
> 
> 
> 
> infile = open('input.log')
> 
> outfile = open('output.txt', 'a')
> 
> # Accumulate lines between START and END lines, ignoring everything else.
> 
> collect = False  # Initially we start by ignoring lines.
> 
> for line in infile:
> 
>     if '-START-' in line:
> 
>         # Ignore any lines already seen, and start collecting.
> 
>         accum = []
> 
>         collect = True
> 
>     elif '-END-' in line:
> 
>         # Write the first ten accumulated lines.
> 
>         outfile.writelines(accum[:10])
> 
>         # Clear the accumulated lines.
> 
>         accum = []
> 
>         # and stop collecting until the next START line
> 
>         collect = False
> 
>     elif collect:
> 
>         accum.append(line)
> 
> 
> 
> outfile.close()
> 
> infile.close()
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

[toc] | [prev] | [next] | [standalone]


#41513

FromDave Angel <davea@davea.name>
Date2013-03-19 10:54 -0400
Message-ID<mailman.3513.1363704886.2939.python-list@python.org>
In reply to#41505
On 03/19/2013 10:20 AM, razinzamada@gmail.com wrote:
> I'm currently trying to extract some data between 2 lines of an input file

Your subject line says "from word".  I'm only guessing that you might 
mean Microsoft Word, a proprietary program that does not, by default, 
save text files.  The following code and description assumes a text 
file, so there's a contradiction.


> using Python. the infile is set up such that there is a line -START- where I need the next 10 lines of code if and only if the -END- condition occurs before the next -START-. The -START- line occurs many times before the -END-. Heres a general example of what I mean:
>

In other words, you want to scan for -END-, then go backwards to -START- 
and use the first ten of the lines between?  Try coding it that way, and 
perhaps it'll be easier.

You also need to consider (and specify behavior for) the possibility 
that start and end are less than 10 lines apart.

> blah
> blah
> -START-
> 10 lines I DONT need
> blah
> -START-
> 10 lines I need
> blah
> blah
> -END-
> blah
> blah
> -START-
> 10 lines I dont need
> blah
> -START-
>
> .... and so on and so forth
>
> so far I have only been able to get the -START- + 10 lines for every iteration, but am at a total loss when it comes to specifying the condition to only write if the -END- condition comes before another -START- condition. I'm a bit of a newb, so any help will be greatly appreciated.
>
>
> heres the code I have for printing the -START- + 10 lines:
>
>      in = open('input.log')
>      out = open('output.txt', 'a')
>
>      lines = in.readlines()
>          for i, line in enumerate(lines):
>              if (line.find('START')) > -1:
>                  out.write(line)
>                  out.write(lines[i + 1])
>                  out.write(lines[i + 2])
>                  out.write(lines[i + 3])
>                  out.write(lines[i + 4])
>                  out.write(lines[i + 5])
>                  out.write(lines[i + 6])
>                  out.write(lines[i + 7])
>                  out.write(lines[i + 8])
>                  out.write(lines[i + 9])
>                  out.write(lines[i + 10])

     or just        out.write(lines[i:i+11)     to write out all 11 of them.
>


-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#41570

Fromrazinzamada@gmail.com
Date2013-03-19 23:14 -0700
Message-ID<877193b9-80f3-466f-b93e-d9eb5150a313@googlegroups.com>
In reply to#41513
Thanks DAVE

On Tuesday, March 19, 2013 8:24:24 PM UTC+5:30, Dave Angel wrote:
> On 03/19/2013 10:20 AM, razinzamada@gmail.com wrote:
> 
> > I'm currently trying to extract some data between 2 lines of an input file
> 
> 
> 
> Your subject line says "from word".  I'm only guessing that you might 
> 
> mean Microsoft Word, a proprietary program that does not, by default, 
> 
> save text files.  The following code and description assumes a text 
> 
> file, so there's a contradiction.
> 
> 
> 
> 
> 
> > using Python. the infile is set up such that there is a line -START- where I need the next 10 lines of code if and only if the -END- condition occurs before the next -START-. The -START- line occurs many times before the -END-. Heres a general example of what I mean:
> 
> >
> 
> 
> 
> In other words, you want to scan for -END-, then go backwards to -START- 
> 
> and use the first ten of the lines between?  Try coding it that way, and 
> 
> perhaps it'll be easier.
> 
> 
> 
> You also need to consider (and specify behavior for) the possibility 
> 
> that start and end are less than 10 lines apart.
> 
> 
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I DONT need
> 
> > blah
> 
> > -START-
> 
> > 10 lines I need
> 
> > blah
> 
> > blah
> 
> > -END-
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I dont need
> 
> > blah
> 
> > -START-
> 
> >
> 
> > .... and so on and so forth
> 
> >
> 
> > so far I have only been able to get the -START- + 10 lines for every iteration, but am at a total loss when it comes to specifying the condition to only write if the -END- condition comes before another -START- condition. I'm a bit of a newb, so any help will be greatly appreciated.
> 
> >
> 
> >
> 
> > heres the code I have for printing the -START- + 10 lines:
> 
> >
> 
> >      in = open('input.log')
> 
> >      out = open('output.txt', 'a')
> 
> >
> 
> >      lines = in.readlines()
> 
> >          for i, line in enumerate(lines):
> 
> >              if (line.find('START')) > -1:
> 
> >                  out.write(line)
> 
> >                  out.write(lines[i + 1])
> 
> >                  out.write(lines[i + 2])
> 
> >                  out.write(lines[i + 3])
> 
> >                  out.write(lines[i + 4])
> 
> >                  out.write(lines[i + 5])
> 
> >                  out.write(lines[i + 6])
> 
> >                  out.write(lines[i + 7])
> 
> >                  out.write(lines[i + 8])
> 
> >                  out.write(lines[i + 9])
> 
> >                  out.write(lines[i + 10])
> 
> 
> 
>      or just        out.write(lines[i:i+11)     to write out all 11 of them.
> 
> >
> 
> 
> 
> 
> 
> -- 
> 
> DaveA

[toc] | [prev] | [next] | [standalone]


#41571

Fromrazinzamada@gmail.com
Date2013-03-19 23:14 -0700
Message-ID<mailman.3548.1363760721.2939.python-list@python.org>
In reply to#41513
Thanks DAVE

On Tuesday, March 19, 2013 8:24:24 PM UTC+5:30, Dave Angel wrote:
> On 03/19/2013 10:20 AM, razinzamada@gmail.com wrote:
> 
> > I'm currently trying to extract some data between 2 lines of an input file
> 
> 
> 
> Your subject line says "from word".  I'm only guessing that you might 
> 
> mean Microsoft Word, a proprietary program that does not, by default, 
> 
> save text files.  The following code and description assumes a text 
> 
> file, so there's a contradiction.
> 
> 
> 
> 
> 
> > using Python. the infile is set up such that there is a line -START- where I need the next 10 lines of code if and only if the -END- condition occurs before the next -START-. The -START- line occurs many times before the -END-. Heres a general example of what I mean:
> 
> >
> 
> 
> 
> In other words, you want to scan for -END-, then go backwards to -START- 
> 
> and use the first ten of the lines between?  Try coding it that way, and 
> 
> perhaps it'll be easier.
> 
> 
> 
> You also need to consider (and specify behavior for) the possibility 
> 
> that start and end are less than 10 lines apart.
> 
> 
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I DONT need
> 
> > blah
> 
> > -START-
> 
> > 10 lines I need
> 
> > blah
> 
> > blah
> 
> > -END-
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I dont need
> 
> > blah
> 
> > -START-
> 
> >
> 
> > .... and so on and so forth
> 
> >
> 
> > so far I have only been able to get the -START- + 10 lines for every iteration, but am at a total loss when it comes to specifying the condition to only write if the -END- condition comes before another -START- condition. I'm a bit of a newb, so any help will be greatly appreciated.
> 
> >
> 
> >
> 
> > heres the code I have for printing the -START- + 10 lines:
> 
> >
> 
> >      in = open('input.log')
> 
> >      out = open('output.txt', 'a')
> 
> >
> 
> >      lines = in.readlines()
> 
> >          for i, line in enumerate(lines):
> 
> >              if (line.find('START')) > -1:
> 
> >                  out.write(line)
> 
> >                  out.write(lines[i + 1])
> 
> >                  out.write(lines[i + 2])
> 
> >                  out.write(lines[i + 3])
> 
> >                  out.write(lines[i + 4])
> 
> >                  out.write(lines[i + 5])
> 
> >                  out.write(lines[i + 6])
> 
> >                  out.write(lines[i + 7])
> 
> >                  out.write(lines[i + 8])
> 
> >                  out.write(lines[i + 9])
> 
> >                  out.write(lines[i + 10])
> 
> 
> 
>      or just        out.write(lines[i:i+11)     to write out all 11 of them.
> 
> >
> 
> 
> 
> 
> 
> -- 
> 
> DaveA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web