Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107583
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2016-04-24 23:29 -0700 |
| References | (2 earlier) <lf5fuufqe81.fsf@ling.helsinki.fi> <991c5867-27d1-4e75-aa52-a7d47e626b74@googlegroups.com> <nfcqjs$guu$1@ger.gmane.org> <mailman.11.1461317067.2861.python-list@python.org> <lf57ffpdhl5.fsf@ling.helsinki.fi> |
| Message-ID | <ee696bf4-706f-4113-bb91-d231ebf47b05@googlegroups.com> (permalink) |
| Subject | Re: delete from pattern to pattern if it contains match |
| From | harirammanohar@gmail.com |
On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
>
> > harirammanohar@gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar@gmail.com writes:
> >>>
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram...@gmail.com wrote:
> >>> >> HI All,
> >>> >>
> >>> >> can you help me out in doing below.
> >>> >>
> >>> >> file:
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >>
> >>> >> output should be:
> >>> >>
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>>
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>>
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>>
> >>> from io import StringIO
> >>>
> >>> text = '''\
> >>> <start>
> >>> guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>> mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>> orange
> >>> fruit
> >>> <end>
> >>> '''
> >>>
> >>> def records(source):
> >>> current = []
> >>> for line in source:
> >>> if line.startswith('<end>'):
> >>> yield current
> >>> current = []
> >>> else:
> >>> current.append(line)
> >>>
> >>> def hasmango(record):
> >>> return any('mango' in it for it in record)
> >>>
> >>> for record in records(StringIO(text)):
> >>> hasmango(record) or print(*record)
> >>
> >> Hi,
> >>
> >> not working....this is the output i am getting...
> >>
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
>
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
>
> >> <start>
> >> guava
> >> fruit
> >>
> >> <start>
> >> orange
> >> fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
>
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
>
> It was already known that this is not the actual format of the data.
>
> > To fix this change the generator to
> >
> > def records(source):
> > current = []
> > for line in source:
> > current.append(line)
> > if line.startswith('<end>'):
> > yield current
> > current = []
>
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
>
> >>> hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
>
> Yes, I forgot about the space. Sorry about that.
>
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
>
> print(*record, sep = "", end = "<end>\n")
>
> Or so:
>
> print(*record, sep = "")
> print("<end>")
>
> Or so:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> print("<end>")
>
> Or:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> if record and not record[-1].strip() == "<end>":
> print("<end>")
>
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
>
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
>
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> > print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
>
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
>
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
>
> text = '''<start>
> guava
> fruit
> <end>
> <start>
> mango
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
> '''
Hi Jussi,
i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
and entire thing works if it has as below:
<!DOCTYPE web-app
<web-app>
what i observe is xml tree parsing is not working if http tags are there in between web-app...
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 00:07 -0700
RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-18 07:49 +0000
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 01:52 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 21:01 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-21 03:17 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-21 13:24 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:00 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:14 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:50 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:24 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-21 16:32 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 01:59 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:24 +0200
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-22 14:10 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:29 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 10:17 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:49 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:53 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:37 +0300
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 12:13 +0200
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:39 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:02 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 14:28 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:40 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 15:00 +0300
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 14:33 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-26 03:31 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:24 +0300
RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-25 10:19 +0000
csiph-web