Groups > comp.lang.python > #107246 > unrolled thread

delete from pattern to pattern if it contains match

Started by	harirammanohar@gmail.com
First post	2016-04-18 00:07 -0700
Last post	2016-04-25 10:19 +0000
Articles	20 on this page of 29 — 4 participants

Back to article view | Back to comp.lang.python

  delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 00:07 -0700
    RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-18 07:49 +0000
      Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 01:52 -0700
      Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 21:01 -0700
    Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-21 03:17 -0700
      Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-21 13:24 +0200
        Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:00 -0700
          Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:14 -0700
            Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:50 +0200
              Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:24 -0700
      Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-21 16:32 +0300
        Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 01:59 -0700
          Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:24 +0200
            Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-22 14:10 +0300
              Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:29 -0700
                Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 10:17 +0300
                  Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:49 -0700
                    Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:53 -0700
                      Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:37 +0300
                    Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 12:13 +0200
                      Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:39 +0300
                        Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:02 -0700
                          Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 14:28 +0300
                            Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:40 -0700
                              Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 15:00 +0300
                              Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 14:33 +0200
                                Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-26 03:31 -0700
                    Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:24 +0300
                    RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-25 10:19 +0000

Page 1 of 2 [1] 2 Next page →

#107246 — delete from pattern to pattern if it contains match

From	harirammanohar@gmail.com
Date	2016-04-18 00:07 -0700
Subject	delete from pattern to pattern if it contains match
Message-ID	<20c0b0fe-136b-4b01-b004-c55c6d47b299@googlegroups.com>

HI All, 

can you help me out in doing below. 

file: 
<start> 
 guava 
fruit 
<end> 
<start> 
 mango 
fruit 
<end> 
<start> 
 orange 
fruit 
<end> 

need to delete from start to end if it contains mango in a file...

output should be: 

<start> 
 guava 
fruit 
<end> 
<start> 
 orange 
fruit 
<end> 

Thank you

[toc] | [next] | [standalone]

#107251

From	Joaquin Alzola <Joaquin.Alzola@lebara.com>
Date	2016-04-18 07:49 +0000
Message-ID	<mailman.142.1460965767.6324.python-list@python.org>
In reply to	#107246

Hi,

Try to use the xml module.

import xml.etree.ElementTree as ET

That might help.

BR

Joaquin

-----Original Message-----
From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of harirammanohar@gmail.com
Sent: 18 April 2016 08:08
To: python-list@python.org
Subject: delete from pattern to pattern if it contains match


HI All,

can you help me out in doing below.

file:
<start>
 guava
fruit
<end>
<start>
 mango
fruit
<end>
<start>
 orange
fruit
<end>

need to delete from start to end if it contains mango in a file...

output should be:

<start>
 guava
fruit
<end>
<start>
 orange
fruit
<end>

Thank you
--
https://mail.python.org/mailman/listinfo/python-list
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

[toc] | [prev] | [next] | [standalone]

#107256

From	harirammanohar@gmail.com
Date	2016-04-18 01:52 -0700
Message-ID	<1020e7f0-68ac-48a3-88f0-056f25f49b6c@googlegroups.com>
In reply to	#107251

On Monday, April 18, 2016 at 1:19:43 PM UTC+5:30, Joaquin Alzola wrote:
> Hi,
> 
> Try to use the xml module.
> 
> import xml.etree.ElementTree as ET
> 
> That might help.
> 
> BR
> 
> Joaquin
> 
> -----Original Message-----

> 
> 
> HI All,
> 
> can you help me out in doing below.
> 
> file:
> <start>
>  guava
> fruit
> <end>
> <start>
>  mango
> fruit
> <end>
> <start>
>  orange
> fruit
> <end>
> 
> need to delete from start to end if it contains mango in a file...
> 
> output should be:
> 
> <start>
>  guava
> fruit
> <end>
> <start>
>  orange
> fruit
> <end>
> 
> Thank you
> --
> https://mail.python.org/mailman/listinfo/python-list
> This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

Hi Alzola,

Still any easier way ??

[toc] | [prev] | [next] | [standalone]

#107290

From	harirammanohar@gmail.com
Date	2016-04-18 21:01 -0700
Message-ID	<670498a7-7033-45e7-b6ac-015d4a40c34e@googlegroups.com>
In reply to	#107251

On Monday, April 18, 2016 at 1:19:43 PM UTC+5:30, Joaquin Alzola wrote:
> Hi,
> 
> Try to use the xml module.
> 
> import xml.etree.ElementTree as ET
> 
> That might help.
> 
> BR
> 
> Joaquin
> 
> -----Original Message-----
> From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of harirammanohar@gmail.com
> Sent: 18 April 2016 08:08
> To: python-list@python.org
> Subject: delete from pattern to pattern if it contains match
> 
> 
> HI All,
> 
> can you help me out in doing below.
> 
> file:
> <start>
>  guava
> fruit
> <end>
> <start>
>  mango
> fruit
> <end>
> <start>
>  orange
> fruit
> <end>
> 
> need to delete from start to end if it contains mango in a file...
> 
> output should be:
> 
> <start>
>  guava
> fruit
> <end>
> <start>
>  orange
> fruit
> <end>
> 
> Thank you
> --
> https://mail.python.org/mailman/listinfo/python-list
> This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

hi Alzola,

xml parsing solution works fine only in the below case if input file is in below format.
<data>
<start> 
 guava 
fruit 
<end> 
<start> 
 mango 
fruit 
<end> 
<start> 
 orange 
fruit 
<end> 
</data>

its not working if the input file as below, just a change in the starting header...
<data xmlns="http://xmlns.jcp.org/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
                      http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
  version="3.1">
<start> 
 guava 
fruit 
<end> 
<start> 
 mango 
fruit 
<end> 
<start> 
 orange 
fruit 
<end>
</data>

inthis case its not working.... pls suggest what i have to do to make it work..

[toc] | [prev] | [next] | [standalone]

#107446

From	harirammanohar@gmail.com
Date	2016-04-21 03:17 -0700
Message-ID	<91432d7b-7233-4504-a725-22bc81637ea3@googlegroups.com>
In reply to	#107246

On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com wrote:
> HI All, 
> 
> can you help me out in doing below. 
> 
> file: 
> <start> 
>  guava 
> fruit 
> <end> 
> <start> 
>  mango 
> fruit 
> <end> 
> <start> 
>  orange 
> fruit 
> <end> 
> 
> need to delete from start to end if it contains mango in a file...
> 
> output should be: 
> 
> <start> 
>  guava 
> fruit 
> <end> 
> <start> 
>  orange 
> fruit 
> <end> 
> 
> Thank you

any one can guide me ? why xml tree parsing is not working if i have root.tag and root.attrib as mentioned in earlier post...

[toc] | [prev] | [next] | [standalone]

#107447

From	Peter Otten <__peter__@web.de>
Date	2016-04-21 13:24 +0200
Message-ID	<mailman.9.1461237901.23626.python-list@python.org>
In reply to	#107446

harirammanohar@gmail.com wrote:

> On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com
> wrote:
>> HI All,
>> 
>> can you help me out in doing below.
>> 
>> file:
>> <start>
>>  guava
>> fruit
>> <end>
>> <start>
>>  mango
>> fruit
>> <end>
>> <start>
>>  orange
>> fruit
>> <end>

Is that literally what you have in the file?

> any one can guide me ? why xml tree parsing is not working if i have
> root.tag and root.attrib as mentioned in earlier post...

The data above is not valid xml. Instead of

<start>...<end>

you need

<start>...</start>

i. e. the end tag must be the same as the start tag, but with a leading "/".

[toc] | [prev] | [next] | [standalone]

#107483

From	harirammanohar@gmail.com
Date	2016-04-22 02:00 -0700
Message-ID	<7fc52496-baf8-41ef-8f5e-76d409c6df84@googlegroups.com>
In reply to	#107447

On Thursday, April 21, 2016 at 4:55:18 PM UTC+5:30, Peter Otten wrote:
> harirammanohar@gmail.com wrote:
> 
> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com
> > wrote:
> >> HI All,
> >> 
> >> can you help me out in doing below.
> >> 
> >> file:
> >> <start>
> >>  guava
> >> fruit
> >> <end>
> >> <start>
> >>  mango
> >> fruit
> >> <end>
> >> <start>
> >>  orange
> >> fruit
> >> <end>
> 
> Is that literally what you have in the file?
> 
> > any one can guide me ? why xml tree parsing is not working if i have
> > root.tag and root.attrib as mentioned in earlier post...
> 
> The data above is not valid xml. Instead of
> 
> <start>...<end>
> 
> you need
> 
> <start>...</start>
> 
> i. e. the end tag must be the same as the start tag, but with a leading "/".

@peter yes here it is not xml, but real data is an xml..believe me..

[toc] | [prev] | [next] | [standalone]

#107484

From	harirammanohar@gmail.com
Date	2016-04-22 02:14 -0700
Message-ID	<a6ff5e82-921c-419e-a07d-f40e46f79b8b@googlegroups.com>
In reply to	#107483

On Friday, April 22, 2016 at 2:30:45 PM UTC+5:30, hariram...@gmail.com wrote:
> On Thursday, April 21, 2016 at 4:55:18 PM UTC+5:30, Peter Otten wrote:
> > harirammanohar@gmail.com wrote:
> > 
> > > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com
> > > wrote:
> > >> HI All,
> > >> 
> > >> can you help me out in doing below.
> > >> 
> > >> file:
> > >> <start>
> > >>  guava
> > >> fruit
> > >> <end>
> > >> <start>
> > >>  mango
> > >> fruit
> > >> <end>
> > >> <start>
> > >>  orange
> > >> fruit
> > >> <end>
> > 
> > Is that literally what you have in the file?
> > 
> > > any one can guide me ? why xml tree parsing is not working if i have
> > > root.tag and root.attrib as mentioned in earlier post...
> > 
> > The data above is not valid xml. Instead of
> > 
> > <start>...<end>
> > 
> > you need
> > 
> > <start>...</start>
> > 
> > i. e. the end tag must be the same as the start tag, but with a leading "/".
> 
> @peter yes here it is not xml, but real data is an xml..believe me..

@peter this is the similar xml i am having, you can correlate.

https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt

[toc] | [prev] | [next] | [standalone]

#107486

From	Peter Otten <__peter__@web.de>
Date	2016-04-22 11:50 +0200
Message-ID	<mailman.12.1461318640.2861.python-list@python.org>
In reply to	#107484

harirammanohar@gmail.com wrote:

>> @peter yes here it is not xml, but real data is an xml..believe me..
> 
> @peter this is the similar xml i am having, you can correlate.
> 
> https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt

This is still too vague.

If you post the code you actually tried in a small standalone script 
together with a small sample xml file that produces the same failure as your 
actual data I or someone might help you fix it.

[toc] | [prev] | [next] | [standalone]

#107582

From	harirammanohar@gmail.com
Date	2016-04-24 23:24 -0700
Message-ID	<99c4bcaa-efc0-4127-b18e-61ea697558b5@googlegroups.com>
In reply to	#107486

On Friday, April 22, 2016 at 3:20:53 PM UTC+5:30, Peter Otten wrote:
> harirammanohar@gmail.com wrote:
> 
> >> @peter yes here it is not xml, but real data is an xml..believe me..
> > 
> > @peter this is the similar xml i am having, you can correlate.
> > 
> > https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt
> 
> This is still too vague.
> 
> If you post the code you actually tried in a small standalone script 
> together with a small sample xml file that produces the same failure as your 
> actual data I or someone might help you fix it.

yeah peter you are correct, i would have done that atleast by changing the strings, but i wasnt as here its an restricted data and the purpose...so i have taken sample xml data...ofcourse tags are missed..

[toc] | [prev] | [next] | [standalone]

#107451

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-21 16:32 +0300
Message-ID	<lf5fuufqe81.fsf@ling.helsinki.fi>
In reply to	#107446

harirammanohar@gmail.com writes:

> On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> hariram...@gmail.com wrote:
>> HI All,
>> 
>> can you help me out in doing below. 
>> 
>> file: 
>> <start> 
>>  guava 
>> fruit 
>> <end> 
>> <start> 
>>  mango 
>> fruit 
>> <end> 
>> <start> 
>>  orange 
>> fruit 
>> <end> 
>> 
>> need to delete from start to end if it contains mango in a file...
>> 
>> output should be: 
>> 
>> <start> 
>>  guava 
>> fruit 
>> <end> 
>> <start> 
>>  orange 
>> fruit 
>> <end> 
>> 
>> Thank you
>
> any one can guide me ? why xml tree parsing is not working if i have
> root.tag and root.attrib as mentioned in earlier post...

Assuming the real consists of lines between a start marker and end
marker, a winning plan is to collect a group of lines, deal with it, and
move on.

The following code implements something close to the plan. You need to
adapt it a bit to have your own source of lines and to restore the end
marker in the output and to account for your real use case and for
differences in taste and judgment. - The plan is as described above, but
there are many ways to implement it.

from io import StringIO

text = '''\
<start> 
  guava 
fruit 
<end> 
<start>
  mango
fruit
<end>
<start> 
  orange 
fruit 
<end> 
'''

def records(source):
    current = []
    for line in source:
        if line.startswith('<end>'):
            yield current
            current = []
        else:
            current.append(line)

def hasmango(record):
    return any('mango' in it for it in record)

for record in records(StringIO(text)):
    hasmango(record) or print(*record)

[toc] | [prev] | [next] | [standalone]

#107482

From	harirammanohar@gmail.com
Date	2016-04-22 01:59 -0700
Message-ID	<991c5867-27d1-4e75-aa52-a7d47e626b74@googlegroups.com>
In reply to	#107451

On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen wrote:
> harirammanohar@gmail.com writes:
> 
> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> > hariram...@gmail.com wrote:
> >> HI All,
> >> 
> >> can you help me out in doing below. 
> >> 
> >> file: 
> >> <start> 
> >>  guava 
> >> fruit 
> >> <end> 
> >> <start> 
> >>  mango 
> >> fruit 
> >> <end> 
> >> <start> 
> >>  orange 
> >> fruit 
> >> <end> 
> >> 
> >> need to delete from start to end if it contains mango in a file...
> >> 
> >> output should be: 
> >> 
> >> <start> 
> >>  guava 
> >> fruit 
> >> <end> 
> >> <start> 
> >>  orange 
> >> fruit 
> >> <end> 
> >> 
> >> Thank you
> >
> > any one can guide me ? why xml tree parsing is not working if i have
> > root.tag and root.attrib as mentioned in earlier post...
> 
> Assuming the real consists of lines between a start marker and end
> marker, a winning plan is to collect a group of lines, deal with it, and
> move on.
> 
> The following code implements something close to the plan. You need to
> adapt it a bit to have your own source of lines and to restore the end
> marker in the output and to account for your real use case and for
> differences in taste and judgment. - The plan is as described above, but
> there are many ways to implement it.
> 
> from io import StringIO
> 
> text = '''\
> <start> 
>   guava 
> fruit 
> <end> 
> <start>
>   mango
> fruit
> <end>
> <start> 
>   orange 
> fruit 
> <end> 
> '''
> 
> def records(source):
>     current = []
>     for line in source:
>         if line.startswith('<end>'):
>             yield current
>             current = []
>         else:
>             current.append(line)
> 
> def hasmango(record):
>     return any('mango' in it for it in record)
> 
> for record in records(StringIO(text)):
>     hasmango(record) or print(*record)

Hi,

not working....this is the output i am getting...

\
 <start>
   guava
 fruit

<start>
   orange
 fruit

[toc] | [prev] | [next] | [standalone]

#107485

From	Peter Otten <__peter__@web.de>
Date	2016-04-22 11:24 +0200
Message-ID	<mailman.11.1461317067.2861.python-list@python.org>
In reply to	#107482

harirammanohar@gmail.com wrote:

> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> wrote:
>> harirammanohar@gmail.com writes:
>> 
>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
>> > hariram...@gmail.com wrote:
>> >> HI All,
>> >> 
>> >> can you help me out in doing below.
>> >> 
>> >> file:
>> >> <start>
>> >>  guava
>> >> fruit
>> >> <end>
>> >> <start>
>> >>  mango
>> >> fruit
>> >> <end>
>> >> <start>
>> >>  orange
>> >> fruit
>> >> <end>
>> >> 
>> >> need to delete from start to end if it contains mango in a file...
>> >> 
>> >> output should be:
>> >> 
>> >> <start>
>> >>  guava
>> >> fruit
>> >> <end>
>> >> <start>
>> >>  orange
>> >> fruit
>> >> <end>
>> >> 
>> >> Thank you
>> >
>> > any one can guide me ? why xml tree parsing is not working if i have
>> > root.tag and root.attrib as mentioned in earlier post...
>> 
>> Assuming the real consists of lines between a start marker and end
>> marker, a winning plan is to collect a group of lines, deal with it, and
>> move on.
>> 
>> The following code implements something close to the plan. You need to
>> adapt it a bit to have your own source of lines and to restore the end
>> marker in the output and to account for your real use case and for
>> differences in taste and judgment. - The plan is as described above, but
>> there are many ways to implement it.
>> 
>> from io import StringIO
>> 
>> text = '''\
>> <start>
>>   guava
>> fruit
>> <end>
>> <start>
>>   mango
>> fruit
>> <end>
>> <start>
>>   orange
>> fruit
>> <end>
>> '''
>> 
>> def records(source):
>>     current = []
>>     for line in source:
>>         if line.startswith('<end>'):
>>             yield current
>>             current = []
>>         else:
>>             current.append(line)
>> 
>> def hasmango(record):
>>     return any('mango' in it for it in record)
>> 
>> for record in records(StringIO(text)):
>>     hasmango(record) or print(*record)
> 
> Hi,
> 
> not working....this is the output i am getting...
> 
> \

This means that the line

>> text = '''\

has trailing whitespace in your copy of the script.

>  <start>
>    guava
>  fruit
> 
> <start>
>    orange
>  fruit

Jussi forgot to add the "<end>..." line to the group. To fix this change the 
generator to

def records(source):
    current = []
    for line in source:
        current.append(line)
        if line.startswith('<end>'):
            yield current
            current = []


>>     hasmango(record) or print(*record)

The

print(*record)

inserts spaces between record entries (i. e. at the beginning of all lines 
except the first) and adds a trailing newline. You can avoid this by 
specifying the delimiters explicitly:

if not hasmango(record):
    print(*record, sep="", end="")

Even with these changes code still looks somewhat brittle...

[toc] | [prev] | [next] | [standalone]

#107487

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-22 14:10 +0300
Message-ID	<lf57ffpdhl5.fsf@ling.helsinki.fi>
In reply to	#107485

Peter Otten writes:

> harirammanohar@gmail.com wrote:
>
>> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
>> wrote:
>>> harirammanohar@gmail.com writes:
>>> 
>>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
>>> > hariram...@gmail.com wrote:
>>> >> HI All,
>>> >> 
>>> >> can you help me out in doing below.
>>> >> 
>>> >> file:
>>> >> <start>
>>> >>  guava
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >>  mango
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >>  orange
>>> >> fruit
>>> >> <end>
>>> >> 
>>> >> need to delete from start to end if it contains mango in a file...
>>> >> 
>>> >> output should be:
>>> >> 
>>> >> <start>
>>> >>  guava
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >>  orange
>>> >> fruit
>>> >> <end>
>>> >> 
>>> >> Thank you
>>> >
>>> > any one can guide me ? why xml tree parsing is not working if i have
>>> > root.tag and root.attrib as mentioned in earlier post...
>>> 
>>> Assuming the real consists of lines between a start marker and end
>>> marker, a winning plan is to collect a group of lines, deal with it, and
>>> move on.
>>> 
>>> The following code implements something close to the plan. You need to
>>> adapt it a bit to have your own source of lines and to restore the end
>>> marker in the output and to account for your real use case and for
>>> differences in taste and judgment. - The plan is as described above, but
>>> there are many ways to implement it.
>>> 
>>> from io import StringIO
>>> 
>>> text = '''\
>>> <start>
>>>   guava
>>> fruit
>>> <end>
>>> <start>
>>>   mango
>>> fruit
>>> <end>
>>> <start>
>>>   orange
>>> fruit
>>> <end>
>>> '''
>>> 
>>> def records(source):
>>>     current = []
>>>     for line in source:
>>>         if line.startswith('<end>'):
>>>             yield current
>>>             current = []
>>>         else:
>>>             current.append(line)
>>> 
>>> def hasmango(record):
>>>     return any('mango' in it for it in record)
>>> 
>>> for record in records(StringIO(text)):
>>>     hasmango(record) or print(*record)
>> 
>> Hi,
>> 
>> not working....this is the output i am getting...
>> 
>> \
>
> This means that the line
>
>>> text = '''\
>
> has trailing whitespace in your copy of the script.

That's a nuisance. I wish otherwise undefined escape sequences in
strings raised an error, similar to a stray space after a line
continuation character.

>>  <start>
>>    guava
>>  fruit
>> 
>> <start>
>>    orange
>>  fruit
>
> Jussi forgot to add the "<end>..." line to the group.

I didn't forget. I meant what I said when I said the OP needs to adapt
the code to (among other things) restore the end marker in the output.
If they can't be bothered to do anything at all, it's their problem.

It was already known that this is not the actual format of the data.

> To fix this change the generator to
>
> def records(source):
>     current = []
>     for line in source:
>         current.append(line)
>         if line.startswith('<end>'):
>             yield current
>             current = []

Oops, I notice that I forgot to start a new record only on encountering
a '<start>' line. That should probably be done, unless the format is
intended to be exactly a sequence of "<start>\n- -\n<end>\n".

>>>     hasmango(record) or print(*record)
>
> The
>
> print(*record)
>
> inserts spaces between record entries (i. e. at the beginning of all
> lines except the first) and adds a trailing newline.

Yes, I forgot about the space. Sorry about that.

The final newline was intentional. Perhaps I should have added the end
marker there instead (given my preference to not drag it together with
the data lines), like so:

   print(*record, sep = "", end = "<end>\n")

Or so:

   print(*record, sep = "")
   print("<end>")

Or so:

   for line in record:
       print(line.rstrip("\n")
   else:
       print("<end>")

Or:

   for line in record:
       print(line.rstrip("\n")
   else:
       if record and not record[-1].strip() == "<end>":
           print("<end>")

But all this is beside the point that to deal with the stated problem
one might want to obtain access to a whole record *first*, then check if
it contains "mango" in the intended way (details missing but at least
"mango\n" as a full line counts as an occurrence), and only *then* print
the whole record (if it doesn't contain "mango").

I can think of two other ways - one if the data can be accessed only
once - but they seem more complicated to me. Hm, well, if it's XML, as
stated in another branch of this thread and contrary to the form of the
example data in this branch, there's a third way that may be good, but
here I'm responding to a line-oriented format.

> You can avoid this by specifying the delimiters explicitly:
>
> if not hasmango(record):
>     print(*record, sep="", end="")
>
> Even with these changes code still looks somewhat brittle...

That depends on the actual data format, and on what really is intended
to trigger the filter. This approach is a complete waste of effort if
there are no guarantees of things being there on their own lines, for
example.

Ok, that "\ " not only looks brittle but actually is brittle. The one
time I used that slash, I now regret doing so. Here's a fixed version.
(Not sure of the significance of the number of spaces that start the
first data line. They seem to have doubled along the way.)

text = '''<start>
  guava
fruit
<end>
<start>
  mango
fruit
<end>
<start>
  orange
fruit
<end>
'''

[toc] | [prev] | [next] | [standalone]

#107583

From	harirammanohar@gmail.com
Date	2016-04-24 23:29 -0700
Message-ID	<ee696bf4-706f-4113-bb91-d231ebf47b05@googlegroups.com>
In reply to	#107487

On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
> 
> > harirammanohar@gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar@gmail.com writes:
> >>> 
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram...@gmail.com wrote:
> >>> >> HI All,
> >>> >> 
> >>> >> can you help me out in doing below.
> >>> >> 
> >>> >> file:
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >> 
> >>> >> output should be:
> >>> >> 
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>> 
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>> 
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>> 
> >>> from io import StringIO
> >>> 
> >>> text = '''\
> >>> <start>
> >>>   guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   orange
> >>> fruit
> >>> <end>
> >>> '''
> >>> 
> >>> def records(source):
> >>>     current = []
> >>>     for line in source:
> >>>         if line.startswith('<end>'):
> >>>             yield current
> >>>             current = []
> >>>         else:
> >>>             current.append(line)
> >>> 
> >>> def hasmango(record):
> >>>     return any('mango' in it for it in record)
> >>> 
> >>> for record in records(StringIO(text)):
> >>>     hasmango(record) or print(*record)
> >> 
> >> Hi,
> >> 
> >> not working....this is the output i am getting...
> >> 
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
> 
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
> 
> >>  <start>
> >>    guava
> >>  fruit
> >> 
> >> <start>
> >>    orange
> >>  fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
> 
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
> 
> It was already known that this is not the actual format of the data.
> 
> > To fix this change the generator to
> >
> > def records(source):
> >     current = []
> >     for line in source:
> >         current.append(line)
> >         if line.startswith('<end>'):
> >             yield current
> >             current = []
> 
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
> 
> >>>     hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
> 
> Yes, I forgot about the space. Sorry about that.
> 
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
> 
>    print(*record, sep = "", end = "<end>\n")
> 
> Or so:
> 
>    print(*record, sep = "")
>    print("<end>")
> 
> Or so:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        print("<end>")
> 
> Or:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        if record and not record[-1].strip() == "<end>":
>            print("<end>")
> 
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
> 
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
> 
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> >     print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
> 
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
> 
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
> 
> text = '''<start>
>   guava
> fruit
> <end>
> <start>
>   mango
> fruit
> <end>
> <start>
>   orange
> fruit
> <end>
> '''

Hi Jussi,

i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...

<!DOCTYPE web-app 
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
    "http://java.sun.com/dtd/web-app_2_3.dtd">

<web-app>

and entire thing works if it has as below:
<!DOCTYPE web-app 
<web-app>

what i observe is xml tree parsing is not working if http tags are there in between web-app...

[toc] | [prev] | [next] | [standalone]

#107584

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-25 10:17 +0300
Message-ID	<lf5d1pew42b.fsf@ling.helsinki.fi>
In reply to	#107583

harirammanohar@gmail.com writes:

> Hi Jussi,
>
> i have seen you have written a definition to fulfill the requirement,
> can we do this same thing using xml parser, as i have failed to
> implement the thing using xml parser of python if the file is having
> the content as below...
>
> <!DOCTYPE web-app 
>     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
>     "http://java.sun.com/dtd/web-app_2_3.dtd">
>
> <web-app>
>
> and entire thing works if it has as below:
> <!DOCTYPE web-app 
> <web-app>
>
> what i observe is xml tree parsing is not working if http tags are
> there in between web-app...

Do you get an error message?

My guess is that the parser needs the DTD but cannot access it. There
appears to be a DTD at that address, http://java.sun.com/... (it
redirects to Oracle, who bought Sun a while ago), but something might
prevent the parser from accessing it by default. If so, the details
depend on what parser you are trying to use. It may be possible to save
that DTD as a local file and point the parser to that.

Your problem is morphing rather wildly. A previous version had namespace
declarations but no DTD or XSD if I remember right. The initial version
wasn't XML at all.

If you post (1) an actual, minimal document, (2) the actual Python
commands that fail to parse it, and (3) the error message you get,
someone will be able to help you. The content of the document need not
be more than "hello, world" level. The DOCTYPE declaration and the
outermost tags with all their attributes and namespace declarations, if
any, are important.

[toc] | [prev] | [next] | [standalone]

#107589

From	harirammanohar@gmail.com
Date	2016-04-25 02:49 -0700
Message-ID	<8001ac2b-c883-4ca1-a163-d118cc82295b@googlegroups.com>
In reply to	#107584

On Monday, April 25, 2016 at 12:47:14 PM UTC+5:30, Jussi Piitulainen wrote:
> harirammanohar@gmail.com writes:
> 
> > Hi Jussi,
> >
> > i have seen you have written a definition to fulfill the requirement,
> > can we do this same thing using xml parser, as i have failed to
> > implement the thing using xml parser of python if the file is having
> > the content as below...
> >
> > <!DOCTYPE web-app 
> >     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
> >     "http://java.sun.com/dtd/web-app_2_3.dtd">
> >
> > <web-app>
> >
> > and entire thing works if it has as below:
> > <!DOCTYPE web-app 
> > <web-app>
> >
> > what i observe is xml tree parsing is not working if http tags are
> > there in between web-app...
> 
> Do you get an error message?
> 
> My guess is that the parser needs the DTD but cannot access it. There
> appears to be a DTD at that address, http://java.sun.com/... (it
> redirects to Oracle, who bought Sun a while ago), but something might
> prevent the parser from accessing it by default. If so, the details
> depend on what parser you are trying to use. It may be possible to save
> that DTD as a local file and point the parser to that.
> 
> Your problem is morphing rather wildly. A previous version had namespace
> declarations but no DTD or XSD if I remember right. The initial version
> wasn't XML at all.
> 
> If you post (1) an actual, minimal document, (2) the actual Python
> commands that fail to parse it, and (3) the error message you get,
> someone will be able to help you. The content of the document need not
> be more than "hello, world" level. The DOCTYPE declaration and the
> outermost tags with all their attributes and namespace declarations, if
> any, are important.

Hi Jussi,

Here is an input file...sample.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
                      http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
  version="3.1">
    <servlet>
      <servlet-name>controller</servlet-name>
      <servlet-class>com.mycompany.mypackage.ControllerServlet</servlet-class>
      <init-param>
        <param-name>listOrders</param-name>
        <param-value>com.mycompany.myactions.ListOrdersAction</param-value>
      </init-param>
      <init-param>
        <param-name>saveCustomer</param-name>
        <param-value>com.mycompany.myactions.SaveCustomerAction</param-value>
      </init-param>
      <load-on-startup>5</load-on-startup>
    </servlet>


    <servlet-mapping>
      <servlet-name>graph</servlet-name>
      <url-pattern>/graph</url-pattern>
    </servlet-mapping>


    <session-config>
      <session-timeout>30</session-timeout>
    </session-config>
</web-app>

--------------------------------
Here is the code:

import xml.etree.ElementTree as ET
ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
tree = ET.parse('sample.xml')
root = tree.getroot()

for servlet in root.findall('servlet'):
        servletname = servlet.find('servlet-name').text
        if servletname == "controller":
                root.remove(servlet)

tree.write('output.xml')

This will work if <web-app> </web-app> doesnt have below...

xmlns="http://xmlns.jcp.org/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
                      http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"

[toc] | [prev] | [next] | [standalone]

#107591

From	harirammanohar@gmail.com
Date	2016-04-25 02:53 -0700
Message-ID	<cd20025d-582b-4844-9334-23b334186c9c@googlegroups.com>
In reply to	#107589

On Monday, April 25, 2016 at 3:19:15 PM UTC+5:30, hariram...@gmail.com wrote:
> On Monday, April 25, 2016 at 12:47:14 PM UTC+5:30, Jussi Piitulainen wrote:
> > harirammanohar@gmail.com writes:
> > 
> > > Hi Jussi,
> > >
> > > i have seen you have written a definition to fulfill the requirement,
> > > can we do this same thing using xml parser, as i have failed to
> > > implement the thing using xml parser of python if the file is having
> > > the content as below...
> > >
> > > <!DOCTYPE web-app 
> > >     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
> > >     "http://java.sun.com/dtd/web-app_2_3.dtd">
> > >
> > > <web-app>
> > >
> > > and entire thing works if it has as below:
> > > <!DOCTYPE web-app 
> > > <web-app>
> > >
> > > what i observe is xml tree parsing is not working if http tags are
> > > there in between web-app...
> > 
> > Do you get an error message?
> > 
> > My guess is that the parser needs the DTD but cannot access it. There
> > appears to be a DTD at that address, http://java.sun.com/... (it
> > redirects to Oracle, who bought Sun a while ago), but something might
> > prevent the parser from accessing it by default. If so, the details
> > depend on what parser you are trying to use. It may be possible to save
> > that DTD as a local file and point the parser to that.
> > 
> > Your problem is morphing rather wildly. A previous version had namespace
> > declarations but no DTD or XSD if I remember right. The initial version
> > wasn't XML at all.
> > 
> > If you post (1) an actual, minimal document, (2) the actual Python
> > commands that fail to parse it, and (3) the error message you get,
> > someone will be able to help you. The content of the document need not
> > be more than "hello, world" level. The DOCTYPE declaration and the
> > outermost tags with all their attributes and namespace declarations, if
> > any, are important.
> 
> Hi Jussi,
> 
> Here is an input file...sample.xml
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
>                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
>   version="3.1">
>     <servlet>
>       <servlet-name>controller</servlet-name>
>       <servlet-class>com.mycompany.mypackage.ControllerServlet</servlet-class>
>       <init-param>
>         <param-name>listOrders</param-name>
>         <param-value>com.mycompany.myactions.ListOrdersAction</param-value>
>       </init-param>
>       <init-param>
>         <param-name>saveCustomer</param-name>
>         <param-value>com.mycompany.myactions.SaveCustomerAction</param-value>
>       </init-param>
>       <load-on-startup>5</load-on-startup>
>     </servlet>
> 
> 
>     <servlet-mapping>
>       <servlet-name>graph</servlet-name>
>       <url-pattern>/graph</url-pattern>
>     </servlet-mapping>
> 
> 
>     <session-config>
>       <session-timeout>30</session-timeout>
>     </session-config>
> </web-app>
> 
> --------------------------------
> Here is the code:
> 
> import xml.etree.ElementTree as ET
> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
> tree = ET.parse('sample.xml')
> root = tree.getroot()
> 
> for servlet in root.findall('servlet'):
>         servletname = servlet.find('servlet-name').text
>         if servletname == "controller":
>                 root.remove(servlet)
> 
> tree.write('output.xml')
> 
> This will work if <web-app> </web-app> doesnt have below...
> 
> xmlns="http://xmlns.jcp.org/xml/ns/javaee"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
>                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"

By the way i didnt get any error message and i am using version 3.4.3

[toc] | [prev] | [next] | [standalone]

#107595

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-25 13:37 +0300
Message-ID	<lf5lh42rn2k.fsf@ling.helsinki.fi>
In reply to	#107591

harirammanohar@gmail.com writes:

> On Monday, April 25, 2016 at 3:19:15 PM UTC+5:30, hariram...@gmail.com wrote:

[- -]

>> Here is the code:
>> 
>> import xml.etree.ElementTree as ET
>> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
>> tree = ET.parse('sample.xml')
>> root = tree.getroot()
>> 
>> for servlet in root.findall('servlet'):
>>         servletname = servlet.find('servlet-name').text
>>         if servletname == "controller":
>>                 root.remove(servlet)
>> 
>> tree.write('output.xml')

[- -]

> By the way i didnt get any error message and i am using version 3.4.3

Right. The parsing succeeds but no 'servlet' elements are found and the
loop simply has no effect. I may be missing some technical detail, but I
think the 'servlet' elements in the document are in the default
namespace (because one was declared) while your .findall and .find calls
are looking for a 'servlet' element that is in no namespace at all. I
seem to remember that there is such a distinction in XML.

[toc] | [prev] | [next] | [standalone]

#107593

From	Peter Otten <__peter__@web.de>
Date	2016-04-25 12:13 +0200
Message-ID	<mailman.70.1461579263.32212.python-list@python.org>
In reply to	#107589

harirammanohar@gmail.com wrote:

> Here is the code:

Finally ;)

> import xml.etree.ElementTree as ET
> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")

I don't know what this does, but probably not what you expected.

> tree = ET.parse('sample.xml')
> root = tree.getroot()
> 
> for servlet in root.findall('servlet'):
>         servletname = servlet.find('servlet-name').text

I think you have to specify the namespace:

for servlet in root.findall('{http://xmlns.jcp.org/xml/ns/javaee}servlet'):
    servletname = servlet.find(
        '{http://xmlns.jcp.org/xml/ns/javaee}servlet-name').text

>         if servletname == "controller":

You could have added a print statement to verify that the line below is 
executed.

>                 root.remove(servlet)
> 
> tree.write('output.xml')
> 
> This will work if <web-app> </web-app> doesnt have below...
> 
> xmlns="http://xmlns.jcp.org/xml/ns/javaee"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
>                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

delete from pattern to pattern if it contains match

Contents

#107246 — delete from pattern to pattern if it contains match

#107251

#107256

#107290

#107446

#107447

#107483

#107484

#107486

#107582

#107451

#107482

#107485

#107487

#107583

#107584

#107589

#107591

#107595

#107593