Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Stefan Behnel <stefan_ml@behnel.de>
Subject: Re: Python 3 - xml - crlf handling problem
Date: Wed, 30 Nov 2011 14:47:59 +0100
References: <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15
In-Reply-To: <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3158.1322660900.27778.python-list@python.org>
Lines: 56
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:16433

durumdara, 30.11.2011 13:08:
> As I see that XML parsing is "wrong" in Python.

You didn't say what you are using for parsing, but from your example, it 
appears likely that you are using the xml.dom.minidom module.


> I must use predefined XML files, parsing them, extending them, and
> produce some result.
>
> But as I see that in Windows this is working wrong.
>
> When the predefined XMLs are "formatted" (prettied) with CRLFs, then
> the parser keeps these plus LF characters (not handle the logic that
> CR = LF = CRLF), and it is appearing in the new result too.

I assume that you are referring to XML's newline normalisation algorithm? 
That should normally be handled by the parser, which, in the case of 
minidom, is usually expat. And I seriously doubt that expat has a problem 
with something as basic as newline normalisation.

Did you verify that the newlines are really not being converted by the 
parser? From your example, I can only see that you are serialising the XML 
tree back into a file, which may or may not alter the line endings by 
itself. Instead, take a look at the text content in the tree right after 
parsing to see how line endings look at that level.


>      xo = parse('test_original.xml')
>      de = xo.documentElement
>      de.setAttribute('b', "2")
>      b = xo.toxml('utf-8')
>      f = open('test_original2.xml', 'wb')
>      f.write(b)
>      f.close()

This doesn't do any pretty printing, though, in case that's what you were 
really after (which appears likely according to your comments).


> And: if I used text elements, this can extend the information with
> plus characters and make wrong xml...

Sorry, I don't understand this sentence.


> I can use only "myowngenerated", and not prettied xmls because of this
> problem!

Then what is the actual problem? Do you get an error somewhere? And if so, 
could you paste the exact error message and describe what you do to produce 
it? The mere fact that the line endings use the normal platform specific 
representation doesn't seem like a problem to me.

Stefan