Re: delete from pattern to pattern if it contains match

Newsgroups	comp.lang.python
Date	2016-04-24 23:29 -0700
References	(2 earlier) <lf5fuufqe81.fsf@ling.helsinki.fi> <991c5867-27d1-4e75-aa52-a7d47e626b74@googlegroups.com> <nfcqjs$guu$1@ger.gmane.org> <mailman.11.1461317067.2861.python-list@python.org> <lf57ffpdhl5.fsf@ling.helsinki.fi>
Message-ID	<ee696bf4-706f-4113-bb91-d231ebf47b05@googlegroups.com> (permalink)
Subject	Re: delete from pattern to pattern if it contains match
From	harirammanohar@gmail.com

Show all headers | View raw

On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
> 
> > harirammanohar@gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar@gmail.com writes:
> >>> 
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram...@gmail.com wrote:
> >>> >> HI All,
> >>> >> 
> >>> >> can you help me out in doing below.
> >>> >> 
> >>> >> file:
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >> 
> >>> >> output should be:
> >>> >> 
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>> 
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>> 
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>> 
> >>> from io import StringIO
> >>> 
> >>> text = '''\
> >>> <start>
> >>>   guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   orange
> >>> fruit
> >>> <end>
> >>> '''
> >>> 
> >>> def records(source):
> >>>     current = []
> >>>     for line in source:
> >>>         if line.startswith('<end>'):
> >>>             yield current
> >>>             current = []
> >>>         else:
> >>>             current.append(line)
> >>> 
> >>> def hasmango(record):
> >>>     return any('mango' in it for it in record)
> >>> 
> >>> for record in records(StringIO(text)):
> >>>     hasmango(record) or print(*record)
> >> 
> >> Hi,
> >> 
> >> not working....this is the output i am getting...
> >> 
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
> 
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
> 
> >>  <start>
> >>    guava
> >>  fruit
> >> 
> >> <start>
> >>    orange
> >>  fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
> 
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
> 
> It was already known that this is not the actual format of the data.
> 
> > To fix this change the generator to
> >
> > def records(source):
> >     current = []
> >     for line in source:
> >         current.append(line)
> >         if line.startswith('<end>'):
> >             yield current
> >             current = []
> 
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
> 
> >>>     hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
> 
> Yes, I forgot about the space. Sorry about that.
> 
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
> 
>    print(*record, sep = "", end = "<end>\n")
> 
> Or so:
> 
>    print(*record, sep = "")
>    print("<end>")
> 
> Or so:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        print("<end>")
> 
> Or:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        if record and not record[-1].strip() == "<end>":
>            print("<end>")
> 
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
> 
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
> 
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> >     print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
> 
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
> 
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
> 
> text = '''<start>
>   guava
> fruit
> <end>
> <start>
>   mango
> fruit
> <end>
> <start>
>   orange
> fruit
> <end>
> '''

Hi Jussi,

i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...

<!DOCTYPE web-app 
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
    "http://java.sun.com/dtd/web-app_2_3.dtd">

<web-app>

and entire thing works if it has as below:
<!DOCTYPE web-app 
<web-app>

what i observe is xml tree parsing is not working if http tags are there in between web-app...

Thread

delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 00:07 -0700
  RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-18 07:49 +0000
    Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 01:52 -0700
    Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 21:01 -0700
  Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-21 03:17 -0700
    Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-21 13:24 +0200
      Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:00 -0700
        Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:14 -0700
          Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:50 +0200
            Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:24 -0700
    Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-21 16:32 +0300
      Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 01:59 -0700
        Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:24 +0200
          Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-22 14:10 +0300
            Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:29 -0700
              Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 10:17 +0300
                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:49 -0700
                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:53 -0700
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:37 +0300
                Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 12:13 +0200
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:39 +0300
                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:02 -0700
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 14:28 +0300
                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:40 -0700
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 15:00 +0300
                Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 14:33 +0200
                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-26 03:31 -0700
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:24 +0300
                RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-25 10:19 +0000

csiph-web