Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107246 > unrolled thread
| Started by | harirammanohar@gmail.com |
|---|---|
| First post | 2016-04-18 00:07 -0700 |
| Last post | 2016-04-25 10:19 +0000 |
| Articles | 20 on this page of 29 — 4 participants |
Back to article view | Back to comp.lang.python
delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 00:07 -0700
RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-18 07:49 +0000
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 01:52 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-18 21:01 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-21 03:17 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-21 13:24 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:00 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 02:14 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:50 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:24 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-21 16:32 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-22 01:59 -0700
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-22 11:24 +0200
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-22 14:10 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-24 23:29 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 10:17 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:49 -0700
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 02:53 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:37 +0300
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 12:13 +0200
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:39 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:02 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 14:28 +0300
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-25 04:40 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 15:00 +0300
Re: delete from pattern to pattern if it contains match Peter Otten <__peter__@web.de> - 2016-04-25 14:33 +0200
Re: delete from pattern to pattern if it contains match harirammanohar@gmail.com - 2016-04-26 03:31 -0700
Re: delete from pattern to pattern if it contains match Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-25 13:24 +0300
RE: delete from pattern to pattern if it contains match Joaquin Alzola <Joaquin.Alzola@lebara.com> - 2016-04-25 10:19 +0000
Page 1 of 2 [1] 2 Next page →
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-18 00:07 -0700 |
| Subject | delete from pattern to pattern if it contains match |
| Message-ID | <20c0b0fe-136b-4b01-b004-c55c6d47b299@googlegroups.com> |
HI All, can you help me out in doing below. file: <start> guava fruit <end> <start> mango fruit <end> <start> orange fruit <end> need to delete from start to end if it contains mango in a file... output should be: <start> guava fruit <end> <start> orange fruit <end> Thank you
[toc] | [next] | [standalone]
| From | Joaquin Alzola <Joaquin.Alzola@lebara.com> |
|---|---|
| Date | 2016-04-18 07:49 +0000 |
| Message-ID | <mailman.142.1460965767.6324.python-list@python.org> |
| In reply to | #107246 |
Hi, Try to use the xml module. import xml.etree.ElementTree as ET That might help. BR Joaquin -----Original Message----- From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of harirammanohar@gmail.com Sent: 18 April 2016 08:08 To: python-list@python.org Subject: delete from pattern to pattern if it contains match HI All, can you help me out in doing below. file: <start> guava fruit <end> <start> mango fruit <end> <start> orange fruit <end> need to delete from start to end if it contains mango in a file... output should be: <start> guava fruit <end> <start> orange fruit <end> Thank you -- https://mail.python.org/mailman/listinfo/python-list This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-18 01:52 -0700 |
| Message-ID | <1020e7f0-68ac-48a3-88f0-056f25f49b6c@googlegroups.com> |
| In reply to | #107251 |
On Monday, April 18, 2016 at 1:19:43 PM UTC+5:30, Joaquin Alzola wrote: > Hi, > > Try to use the xml module. > > import xml.etree.ElementTree as ET > > That might help. > > BR > > Joaquin > > -----Original Message----- > > > HI All, > > can you help me out in doing below. > > file: > <start> > guava > fruit > <end> > <start> > mango > fruit > <end> > <start> > orange > fruit > <end> > > need to delete from start to end if it contains mango in a file... > > output should be: > > <start> > guava > fruit > <end> > <start> > orange > fruit > <end> > > Thank you > -- > https://mail.python.org/mailman/listinfo/python-list > This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. Hi Alzola, Still any easier way ??
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-18 21:01 -0700 |
| Message-ID | <670498a7-7033-45e7-b6ac-015d4a40c34e@googlegroups.com> |
| In reply to | #107251 |
On Monday, April 18, 2016 at 1:19:43 PM UTC+5:30, Joaquin Alzola wrote:
> Hi,
>
> Try to use the xml module.
>
> import xml.etree.ElementTree as ET
>
> That might help.
>
> BR
>
> Joaquin
>
> -----Original Message-----
> From: Python-list [mailto:python-list-bounces+joaquin.alzola=lebara.com@python.org] On Behalf Of harirammanohar@gmail.com
> Sent: 18 April 2016 08:08
> To: python-list@python.org
> Subject: delete from pattern to pattern if it contains match
>
>
> HI All,
>
> can you help me out in doing below.
>
> file:
> <start>
> guava
> fruit
> <end>
> <start>
> mango
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
>
> need to delete from start to end if it contains mango in a file...
>
> output should be:
>
> <start>
> guava
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
>
> Thank you
> --
> https://mail.python.org/mailman/listinfo/python-list
> This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
hi Alzola,
xml parsing solution works fine only in the below case if input file is in below format.
<data>
<start>
guava
fruit
<end>
<start>
mango
fruit
<end>
<start>
orange
fruit
<end>
</data>
its not working if the input file as below, just a change in the starting header...
<data xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
version="3.1">
<start>
guava
fruit
<end>
<start>
mango
fruit
<end>
<start>
orange
fruit
<end>
</data>
inthis case its not working.... pls suggest what i have to do to make it work..
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-21 03:17 -0700 |
| Message-ID | <91432d7b-7233-4504-a725-22bc81637ea3@googlegroups.com> |
| In reply to | #107246 |
On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com wrote: > HI All, > > can you help me out in doing below. > > file: > <start> > guava > fruit > <end> > <start> > mango > fruit > <end> > <start> > orange > fruit > <end> > > need to delete from start to end if it contains mango in a file... > > output should be: > > <start> > guava > fruit > <end> > <start> > orange > fruit > <end> > > Thank you any one can guide me ? why xml tree parsing is not working if i have root.tag and root.attrib as mentioned in earlier post...
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-21 13:24 +0200 |
| Message-ID | <mailman.9.1461237901.23626.python-list@python.org> |
| In reply to | #107446 |
harirammanohar@gmail.com wrote: > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com > wrote: >> HI All, >> >> can you help me out in doing below. >> >> file: >> <start> >> guava >> fruit >> <end> >> <start> >> mango >> fruit >> <end> >> <start> >> orange >> fruit >> <end> Is that literally what you have in the file? > any one can guide me ? why xml tree parsing is not working if i have > root.tag and root.attrib as mentioned in earlier post... The data above is not valid xml. Instead of <start>...<end> you need <start>...</start> i. e. the end tag must be the same as the start tag, but with a leading "/".
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-22 02:00 -0700 |
| Message-ID | <7fc52496-baf8-41ef-8f5e-76d409c6df84@googlegroups.com> |
| In reply to | #107447 |
On Thursday, April 21, 2016 at 4:55:18 PM UTC+5:30, Peter Otten wrote: > harirammanohar@gmail.com wrote: > > > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com > > wrote: > >> HI All, > >> > >> can you help me out in doing below. > >> > >> file: > >> <start> > >> guava > >> fruit > >> <end> > >> <start> > >> mango > >> fruit > >> <end> > >> <start> > >> orange > >> fruit > >> <end> > > Is that literally what you have in the file? > > > any one can guide me ? why xml tree parsing is not working if i have > > root.tag and root.attrib as mentioned in earlier post... > > The data above is not valid xml. Instead of > > <start>...<end> > > you need > > <start>...</start> > > i. e. the end tag must be the same as the start tag, but with a leading "/". @peter yes here it is not xml, but real data is an xml..believe me..
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-22 02:14 -0700 |
| Message-ID | <a6ff5e82-921c-419e-a07d-f40e46f79b8b@googlegroups.com> |
| In reply to | #107483 |
On Friday, April 22, 2016 at 2:30:45 PM UTC+5:30, hariram...@gmail.com wrote: > On Thursday, April 21, 2016 at 4:55:18 PM UTC+5:30, Peter Otten wrote: > > harirammanohar@gmail.com wrote: > > > > > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, hariram...@gmail.com > > > wrote: > > >> HI All, > > >> > > >> can you help me out in doing below. > > >> > > >> file: > > >> <start> > > >> guava > > >> fruit > > >> <end> > > >> <start> > > >> mango > > >> fruit > > >> <end> > > >> <start> > > >> orange > > >> fruit > > >> <end> > > > > Is that literally what you have in the file? > > > > > any one can guide me ? why xml tree parsing is not working if i have > > > root.tag and root.attrib as mentioned in earlier post... > > > > The data above is not valid xml. Instead of > > > > <start>...<end> > > > > you need > > > > <start>...</start> > > > > i. e. the end tag must be the same as the start tag, but with a leading "/". > > @peter yes here it is not xml, but real data is an xml..believe me.. @peter this is the similar xml i am having, you can correlate. https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-22 11:50 +0200 |
| Message-ID | <mailman.12.1461318640.2861.python-list@python.org> |
| In reply to | #107484 |
harirammanohar@gmail.com wrote: >> @peter yes here it is not xml, but real data is an xml..believe me.. > > @peter this is the similar xml i am having, you can correlate. > > https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt This is still too vague. If you post the code you actually tried in a small standalone script together with a small sample xml file that produces the same failure as your actual data I or someone might help you fix it.
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-24 23:24 -0700 |
| Message-ID | <99c4bcaa-efc0-4127-b18e-61ea697558b5@googlegroups.com> |
| In reply to | #107486 |
On Friday, April 22, 2016 at 3:20:53 PM UTC+5:30, Peter Otten wrote: > harirammanohar@gmail.com wrote: > > >> @peter yes here it is not xml, but real data is an xml..believe me.. > > > > @peter this is the similar xml i am having, you can correlate. > > > > https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt > > This is still too vague. > > If you post the code you actually tried in a small standalone script > together with a small sample xml file that produces the same failure as your > actual data I or someone might help you fix it. yeah peter you are correct, i would have done that atleast by changing the strings, but i wasnt as here its an restricted data and the purpose...so i have taken sample xml data...ofcourse tags are missed..
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-21 16:32 +0300 |
| Message-ID | <lf5fuufqe81.fsf@ling.helsinki.fi> |
| In reply to | #107446 |
harirammanohar@gmail.com writes:
> On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> hariram...@gmail.com wrote:
>> HI All,
>>
>> can you help me out in doing below.
>>
>> file:
>> <start>
>> guava
>> fruit
>> <end>
>> <start>
>> mango
>> fruit
>> <end>
>> <start>
>> orange
>> fruit
>> <end>
>>
>> need to delete from start to end if it contains mango in a file...
>>
>> output should be:
>>
>> <start>
>> guava
>> fruit
>> <end>
>> <start>
>> orange
>> fruit
>> <end>
>>
>> Thank you
>
> any one can guide me ? why xml tree parsing is not working if i have
> root.tag and root.attrib as mentioned in earlier post...
Assuming the real consists of lines between a start marker and end
marker, a winning plan is to collect a group of lines, deal with it, and
move on.
The following code implements something close to the plan. You need to
adapt it a bit to have your own source of lines and to restore the end
marker in the output and to account for your real use case and for
differences in taste and judgment. - The plan is as described above, but
there are many ways to implement it.
from io import StringIO
text = '''\
<start>
guava
fruit
<end>
<start>
mango
fruit
<end>
<start>
orange
fruit
<end>
'''
def records(source):
current = []
for line in source:
if line.startswith('<end>'):
yield current
current = []
else:
current.append(line)
def hasmango(record):
return any('mango' in it for it in record)
for record in records(StringIO(text)):
hasmango(record) or print(*record)
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-22 01:59 -0700 |
| Message-ID | <991c5867-27d1-4e75-aa52-a7d47e626b74@googlegroups.com> |
| In reply to | #107451 |
On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen wrote:
> harirammanohar@gmail.com writes:
>
> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> > hariram...@gmail.com wrote:
> >> HI All,
> >>
> >> can you help me out in doing below.
> >>
> >> file:
> >> <start>
> >> guava
> >> fruit
> >> <end>
> >> <start>
> >> mango
> >> fruit
> >> <end>
> >> <start>
> >> orange
> >> fruit
> >> <end>
> >>
> >> need to delete from start to end if it contains mango in a file...
> >>
> >> output should be:
> >>
> >> <start>
> >> guava
> >> fruit
> >> <end>
> >> <start>
> >> orange
> >> fruit
> >> <end>
> >>
> >> Thank you
> >
> > any one can guide me ? why xml tree parsing is not working if i have
> > root.tag and root.attrib as mentioned in earlier post...
>
> Assuming the real consists of lines between a start marker and end
> marker, a winning plan is to collect a group of lines, deal with it, and
> move on.
>
> The following code implements something close to the plan. You need to
> adapt it a bit to have your own source of lines and to restore the end
> marker in the output and to account for your real use case and for
> differences in taste and judgment. - The plan is as described above, but
> there are many ways to implement it.
>
> from io import StringIO
>
> text = '''\
> <start>
> guava
> fruit
> <end>
> <start>
> mango
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
> '''
>
> def records(source):
> current = []
> for line in source:
> if line.startswith('<end>'):
> yield current
> current = []
> else:
> current.append(line)
>
> def hasmango(record):
> return any('mango' in it for it in record)
>
> for record in records(StringIO(text)):
> hasmango(record) or print(*record)
Hi,
not working....this is the output i am getting...
\
<start>
guava
fruit
<start>
orange
fruit
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-22 11:24 +0200 |
| Message-ID | <mailman.11.1461317067.2861.python-list@python.org> |
| In reply to | #107482 |
harirammanohar@gmail.com wrote:
> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> wrote:
>> harirammanohar@gmail.com writes:
>>
>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
>> > hariram...@gmail.com wrote:
>> >> HI All,
>> >>
>> >> can you help me out in doing below.
>> >>
>> >> file:
>> >> <start>
>> >> guava
>> >> fruit
>> >> <end>
>> >> <start>
>> >> mango
>> >> fruit
>> >> <end>
>> >> <start>
>> >> orange
>> >> fruit
>> >> <end>
>> >>
>> >> need to delete from start to end if it contains mango in a file...
>> >>
>> >> output should be:
>> >>
>> >> <start>
>> >> guava
>> >> fruit
>> >> <end>
>> >> <start>
>> >> orange
>> >> fruit
>> >> <end>
>> >>
>> >> Thank you
>> >
>> > any one can guide me ? why xml tree parsing is not working if i have
>> > root.tag and root.attrib as mentioned in earlier post...
>>
>> Assuming the real consists of lines between a start marker and end
>> marker, a winning plan is to collect a group of lines, deal with it, and
>> move on.
>>
>> The following code implements something close to the plan. You need to
>> adapt it a bit to have your own source of lines and to restore the end
>> marker in the output and to account for your real use case and for
>> differences in taste and judgment. - The plan is as described above, but
>> there are many ways to implement it.
>>
>> from io import StringIO
>>
>> text = '''\
>> <start>
>> guava
>> fruit
>> <end>
>> <start>
>> mango
>> fruit
>> <end>
>> <start>
>> orange
>> fruit
>> <end>
>> '''
>>
>> def records(source):
>> current = []
>> for line in source:
>> if line.startswith('<end>'):
>> yield current
>> current = []
>> else:
>> current.append(line)
>>
>> def hasmango(record):
>> return any('mango' in it for it in record)
>>
>> for record in records(StringIO(text)):
>> hasmango(record) or print(*record)
>
> Hi,
>
> not working....this is the output i am getting...
>
> \
This means that the line
>> text = '''\
has trailing whitespace in your copy of the script.
> <start>
> guava
> fruit
>
> <start>
> orange
> fruit
Jussi forgot to add the "<end>..." line to the group. To fix this change the
generator to
def records(source):
current = []
for line in source:
current.append(line)
if line.startswith('<end>'):
yield current
current = []
>> hasmango(record) or print(*record)
The
print(*record)
inserts spaces between record entries (i. e. at the beginning of all lines
except the first) and adds a trailing newline. You can avoid this by
specifying the delimiters explicitly:
if not hasmango(record):
print(*record, sep="", end="")
Even with these changes code still looks somewhat brittle...
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-22 14:10 +0300 |
| Message-ID | <lf57ffpdhl5.fsf@ling.helsinki.fi> |
| In reply to | #107485 |
Peter Otten writes:
> harirammanohar@gmail.com wrote:
>
>> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
>> wrote:
>>> harirammanohar@gmail.com writes:
>>>
>>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
>>> > hariram...@gmail.com wrote:
>>> >> HI All,
>>> >>
>>> >> can you help me out in doing below.
>>> >>
>>> >> file:
>>> >> <start>
>>> >> guava
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >> mango
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >> orange
>>> >> fruit
>>> >> <end>
>>> >>
>>> >> need to delete from start to end if it contains mango in a file...
>>> >>
>>> >> output should be:
>>> >>
>>> >> <start>
>>> >> guava
>>> >> fruit
>>> >> <end>
>>> >> <start>
>>> >> orange
>>> >> fruit
>>> >> <end>
>>> >>
>>> >> Thank you
>>> >
>>> > any one can guide me ? why xml tree parsing is not working if i have
>>> > root.tag and root.attrib as mentioned in earlier post...
>>>
>>> Assuming the real consists of lines between a start marker and end
>>> marker, a winning plan is to collect a group of lines, deal with it, and
>>> move on.
>>>
>>> The following code implements something close to the plan. You need to
>>> adapt it a bit to have your own source of lines and to restore the end
>>> marker in the output and to account for your real use case and for
>>> differences in taste and judgment. - The plan is as described above, but
>>> there are many ways to implement it.
>>>
>>> from io import StringIO
>>>
>>> text = '''\
>>> <start>
>>> guava
>>> fruit
>>> <end>
>>> <start>
>>> mango
>>> fruit
>>> <end>
>>> <start>
>>> orange
>>> fruit
>>> <end>
>>> '''
>>>
>>> def records(source):
>>> current = []
>>> for line in source:
>>> if line.startswith('<end>'):
>>> yield current
>>> current = []
>>> else:
>>> current.append(line)
>>>
>>> def hasmango(record):
>>> return any('mango' in it for it in record)
>>>
>>> for record in records(StringIO(text)):
>>> hasmango(record) or print(*record)
>>
>> Hi,
>>
>> not working....this is the output i am getting...
>>
>> \
>
> This means that the line
>
>>> text = '''\
>
> has trailing whitespace in your copy of the script.
That's a nuisance. I wish otherwise undefined escape sequences in
strings raised an error, similar to a stray space after a line
continuation character.
>> <start>
>> guava
>> fruit
>>
>> <start>
>> orange
>> fruit
>
> Jussi forgot to add the "<end>..." line to the group.
I didn't forget. I meant what I said when I said the OP needs to adapt
the code to (among other things) restore the end marker in the output.
If they can't be bothered to do anything at all, it's their problem.
It was already known that this is not the actual format of the data.
> To fix this change the generator to
>
> def records(source):
> current = []
> for line in source:
> current.append(line)
> if line.startswith('<end>'):
> yield current
> current = []
Oops, I notice that I forgot to start a new record only on encountering
a '<start>' line. That should probably be done, unless the format is
intended to be exactly a sequence of "<start>\n- -\n<end>\n".
>>> hasmango(record) or print(*record)
>
> The
>
> print(*record)
>
> inserts spaces between record entries (i. e. at the beginning of all
> lines except the first) and adds a trailing newline.
Yes, I forgot about the space. Sorry about that.
The final newline was intentional. Perhaps I should have added the end
marker there instead (given my preference to not drag it together with
the data lines), like so:
print(*record, sep = "", end = "<end>\n")
Or so:
print(*record, sep = "")
print("<end>")
Or so:
for line in record:
print(line.rstrip("\n")
else:
print("<end>")
Or:
for line in record:
print(line.rstrip("\n")
else:
if record and not record[-1].strip() == "<end>":
print("<end>")
But all this is beside the point that to deal with the stated problem
one might want to obtain access to a whole record *first*, then check if
it contains "mango" in the intended way (details missing but at least
"mango\n" as a full line counts as an occurrence), and only *then* print
the whole record (if it doesn't contain "mango").
I can think of two other ways - one if the data can be accessed only
once - but they seem more complicated to me. Hm, well, if it's XML, as
stated in another branch of this thread and contrary to the form of the
example data in this branch, there's a third way that may be good, but
here I'm responding to a line-oriented format.
> You can avoid this by specifying the delimiters explicitly:
>
> if not hasmango(record):
> print(*record, sep="", end="")
>
> Even with these changes code still looks somewhat brittle...
That depends on the actual data format, and on what really is intended
to trigger the filter. This approach is a complete waste of effort if
there are no guarantees of things being there on their own lines, for
example.
Ok, that "\ " not only looks brittle but actually is brittle. The one
time I used that slash, I now regret doing so. Here's a fixed version.
(Not sure of the significance of the number of spaces that start the
first data line. They seem to have doubled along the way.)
text = '''<start>
guava
fruit
<end>
<start>
mango
fruit
<end>
<start>
orange
fruit
<end>
'''
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-24 23:29 -0700 |
| Message-ID | <ee696bf4-706f-4113-bb91-d231ebf47b05@googlegroups.com> |
| In reply to | #107487 |
On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
>
> > harirammanohar@gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar@gmail.com writes:
> >>>
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram...@gmail.com wrote:
> >>> >> HI All,
> >>> >>
> >>> >> can you help me out in doing below.
> >>> >>
> >>> >> file:
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >>
> >>> >> output should be:
> >>> >>
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>>
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>>
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>>
> >>> from io import StringIO
> >>>
> >>> text = '''\
> >>> <start>
> >>> guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>> mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>> orange
> >>> fruit
> >>> <end>
> >>> '''
> >>>
> >>> def records(source):
> >>> current = []
> >>> for line in source:
> >>> if line.startswith('<end>'):
> >>> yield current
> >>> current = []
> >>> else:
> >>> current.append(line)
> >>>
> >>> def hasmango(record):
> >>> return any('mango' in it for it in record)
> >>>
> >>> for record in records(StringIO(text)):
> >>> hasmango(record) or print(*record)
> >>
> >> Hi,
> >>
> >> not working....this is the output i am getting...
> >>
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
>
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
>
> >> <start>
> >> guava
> >> fruit
> >>
> >> <start>
> >> orange
> >> fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
>
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
>
> It was already known that this is not the actual format of the data.
>
> > To fix this change the generator to
> >
> > def records(source):
> > current = []
> > for line in source:
> > current.append(line)
> > if line.startswith('<end>'):
> > yield current
> > current = []
>
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
>
> >>> hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
>
> Yes, I forgot about the space. Sorry about that.
>
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
>
> print(*record, sep = "", end = "<end>\n")
>
> Or so:
>
> print(*record, sep = "")
> print("<end>")
>
> Or so:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> print("<end>")
>
> Or:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> if record and not record[-1].strip() == "<end>":
> print("<end>")
>
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
>
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
>
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> > print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
>
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
>
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
>
> text = '''<start>
> guava
> fruit
> <end>
> <start>
> mango
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
> '''
Hi Jussi,
i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
and entire thing works if it has as below:
<!DOCTYPE web-app
<web-app>
what i observe is xml tree parsing is not working if http tags are there in between web-app...
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-25 10:17 +0300 |
| Message-ID | <lf5d1pew42b.fsf@ling.helsinki.fi> |
| In reply to | #107583 |
harirammanohar@gmail.com writes: > Hi Jussi, > > i have seen you have written a definition to fulfill the requirement, > can we do this same thing using xml parser, as i have failed to > implement the thing using xml parser of python if the file is having > the content as below... > > <!DOCTYPE web-app > PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" > "http://java.sun.com/dtd/web-app_2_3.dtd"> > > <web-app> > > and entire thing works if it has as below: > <!DOCTYPE web-app > <web-app> > > what i observe is xml tree parsing is not working if http tags are > there in between web-app... Do you get an error message? My guess is that the parser needs the DTD but cannot access it. There appears to be a DTD at that address, http://java.sun.com/... (it redirects to Oracle, who bought Sun a while ago), but something might prevent the parser from accessing it by default. If so, the details depend on what parser you are trying to use. It may be possible to save that DTD as a local file and point the parser to that. Your problem is morphing rather wildly. A previous version had namespace declarations but no DTD or XSD if I remember right. The initial version wasn't XML at all. If you post (1) an actual, minimal document, (2) the actual Python commands that fail to parse it, and (3) the error message you get, someone will be able to help you. The content of the document need not be more than "hello, world" level. The DOCTYPE declaration and the outermost tags with all their attributes and namespace declarations, if any, are important.
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-25 02:49 -0700 |
| Message-ID | <8001ac2b-c883-4ca1-a163-d118cc82295b@googlegroups.com> |
| In reply to | #107584 |
On Monday, April 25, 2016 at 12:47:14 PM UTC+5:30, Jussi Piitulainen wrote:
> harirammanohar@gmail.com writes:
>
> > Hi Jussi,
> >
> > i have seen you have written a definition to fulfill the requirement,
> > can we do this same thing using xml parser, as i have failed to
> > implement the thing using xml parser of python if the file is having
> > the content as below...
> >
> > <!DOCTYPE web-app
> > PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
> > "http://java.sun.com/dtd/web-app_2_3.dtd">
> >
> > <web-app>
> >
> > and entire thing works if it has as below:
> > <!DOCTYPE web-app
> > <web-app>
> >
> > what i observe is xml tree parsing is not working if http tags are
> > there in between web-app...
>
> Do you get an error message?
>
> My guess is that the parser needs the DTD but cannot access it. There
> appears to be a DTD at that address, http://java.sun.com/... (it
> redirects to Oracle, who bought Sun a while ago), but something might
> prevent the parser from accessing it by default. If so, the details
> depend on what parser you are trying to use. It may be possible to save
> that DTD as a local file and point the parser to that.
>
> Your problem is morphing rather wildly. A previous version had namespace
> declarations but no DTD or XSD if I remember right. The initial version
> wasn't XML at all.
>
> If you post (1) an actual, minimal document, (2) the actual Python
> commands that fail to parse it, and (3) the error message you get,
> someone will be able to help you. The content of the document need not
> be more than "hello, world" level. The DOCTYPE declaration and the
> outermost tags with all their attributes and namespace declarations, if
> any, are important.
Hi Jussi,
Here is an input file...sample.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
version="3.1">
<servlet>
<servlet-name>controller</servlet-name>
<servlet-class>com.mycompany.mypackage.ControllerServlet</servlet-class>
<init-param>
<param-name>listOrders</param-name>
<param-value>com.mycompany.myactions.ListOrdersAction</param-value>
</init-param>
<init-param>
<param-name>saveCustomer</param-name>
<param-value>com.mycompany.myactions.SaveCustomerAction</param-value>
</init-param>
<load-on-startup>5</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>graph</servlet-name>
<url-pattern>/graph</url-pattern>
</servlet-mapping>
<session-config>
<session-timeout>30</session-timeout>
</session-config>
</web-app>
--------------------------------
Here is the code:
import xml.etree.ElementTree as ET
ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
tree = ET.parse('sample.xml')
root = tree.getroot()
for servlet in root.findall('servlet'):
servletname = servlet.find('servlet-name').text
if servletname == "controller":
root.remove(servlet)
tree.write('output.xml')
This will work if <web-app> </web-app> doesnt have below...
xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
[toc] | [prev] | [next] | [standalone]
| From | harirammanohar@gmail.com |
|---|---|
| Date | 2016-04-25 02:53 -0700 |
| Message-ID | <cd20025d-582b-4844-9334-23b334186c9c@googlegroups.com> |
| In reply to | #107589 |
On Monday, April 25, 2016 at 3:19:15 PM UTC+5:30, hariram...@gmail.com wrote:
> On Monday, April 25, 2016 at 12:47:14 PM UTC+5:30, Jussi Piitulainen wrote:
> > harirammanohar@gmail.com writes:
> >
> > > Hi Jussi,
> > >
> > > i have seen you have written a definition to fulfill the requirement,
> > > can we do this same thing using xml parser, as i have failed to
> > > implement the thing using xml parser of python if the file is having
> > > the content as below...
> > >
> > > <!DOCTYPE web-app
> > > PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
> > > "http://java.sun.com/dtd/web-app_2_3.dtd">
> > >
> > > <web-app>
> > >
> > > and entire thing works if it has as below:
> > > <!DOCTYPE web-app
> > > <web-app>
> > >
> > > what i observe is xml tree parsing is not working if http tags are
> > > there in between web-app...
> >
> > Do you get an error message?
> >
> > My guess is that the parser needs the DTD but cannot access it. There
> > appears to be a DTD at that address, http://java.sun.com/... (it
> > redirects to Oracle, who bought Sun a while ago), but something might
> > prevent the parser from accessing it by default. If so, the details
> > depend on what parser you are trying to use. It may be possible to save
> > that DTD as a local file and point the parser to that.
> >
> > Your problem is morphing rather wildly. A previous version had namespace
> > declarations but no DTD or XSD if I remember right. The initial version
> > wasn't XML at all.
> >
> > If you post (1) an actual, minimal document, (2) the actual Python
> > commands that fail to parse it, and (3) the error message you get,
> > someone will be able to help you. The content of the document need not
> > be more than "hello, world" level. The DOCTYPE declaration and the
> > outermost tags with all their attributes and namespace declarations, if
> > any, are important.
>
> Hi Jussi,
>
> Here is an input file...sample.xml
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
> http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
> version="3.1">
> <servlet>
> <servlet-name>controller</servlet-name>
> <servlet-class>com.mycompany.mypackage.ControllerServlet</servlet-class>
> <init-param>
> <param-name>listOrders</param-name>
> <param-value>com.mycompany.myactions.ListOrdersAction</param-value>
> </init-param>
> <init-param>
> <param-name>saveCustomer</param-name>
> <param-value>com.mycompany.myactions.SaveCustomerAction</param-value>
> </init-param>
> <load-on-startup>5</load-on-startup>
> </servlet>
>
>
> <servlet-mapping>
> <servlet-name>graph</servlet-name>
> <url-pattern>/graph</url-pattern>
> </servlet-mapping>
>
>
> <session-config>
> <session-timeout>30</session-timeout>
> </session-config>
> </web-app>
>
> --------------------------------
> Here is the code:
>
> import xml.etree.ElementTree as ET
> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
> tree = ET.parse('sample.xml')
> root = tree.getroot()
>
> for servlet in root.findall('servlet'):
> servletname = servlet.find('servlet-name').text
> if servletname == "controller":
> root.remove(servlet)
>
> tree.write('output.xml')
>
> This will work if <web-app> </web-app> doesnt have below...
>
> xmlns="http://xmlns.jcp.org/xml/ns/javaee"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
> http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
By the way i didnt get any error message and i am using version 3.4.3
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-25 13:37 +0300 |
| Message-ID | <lf5lh42rn2k.fsf@ling.helsinki.fi> |
| In reply to | #107591 |
harirammanohar@gmail.com writes:
> On Monday, April 25, 2016 at 3:19:15 PM UTC+5:30, hariram...@gmail.com wrote:
[- -]
>> Here is the code:
>>
>> import xml.etree.ElementTree as ET
>> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
>> tree = ET.parse('sample.xml')
>> root = tree.getroot()
>>
>> for servlet in root.findall('servlet'):
>> servletname = servlet.find('servlet-name').text
>> if servletname == "controller":
>> root.remove(servlet)
>>
>> tree.write('output.xml')
[- -]
> By the way i didnt get any error message and i am using version 3.4.3
Right. The parsing succeeds but no 'servlet' elements are found and the
loop simply has no effect. I may be missing some technical detail, but I
think the 'servlet' elements in the document are in the default
namespace (because one was declared) while your .findall and .find calls
are looking for a 'servlet' element that is in no namespace at all. I
seem to remember that there is such a distinction in XML.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-25 12:13 +0200 |
| Message-ID | <mailman.70.1461579263.32212.python-list@python.org> |
| In reply to | #107589 |
harirammanohar@gmail.com wrote:
> Here is the code:
Finally ;)
> import xml.etree.ElementTree as ET
> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
I don't know what this does, but probably not what you expected.
> tree = ET.parse('sample.xml')
> root = tree.getroot()
>
> for servlet in root.findall('servlet'):
> servletname = servlet.find('servlet-name').text
I think you have to specify the namespace:
for servlet in root.findall('{http://xmlns.jcp.org/xml/ns/javaee}servlet'):
servletname = servlet.find(
'{http://xmlns.jcp.org/xml/ns/javaee}servlet-name').text
> if servletname == "controller":
You could have added a print statement to verify that the line below is
executed.
> root.remove(servlet)
>
> tree.write('output.xml')
>
> This will work if <web-app> </web-app> doesnt have below...
>
> xmlns="http://xmlns.jcp.org/xml/ns/javaee"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
> http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web