Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.04; 'parser': 0.05; 'predefined': 0.05; 'subject:Python': 0.05; 'subject:xml': 0.07; 'alter': 0.09; 'newline': 0.09; 'parsing': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'expat': 0.15; 'elements,': 0.16; 'expat.': 0.16; 'f.close()': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'newlines': 0.16; 'parser,': 0.16; 'parser?': 0.16; 'problem!': 0.16; 'received:188.174': 0.16; 'received:mnet-online.de': 0.16; 'somewhere?': 0.16; 'subject:handling': 0.16; 'converted': 0.18; 'problem?': 0.18; 'file,': 0.19; 'subject:problem': 0.19; '(which': 0.19; 'appears': 0.19; 'result.': 0.21; "doesn't": 0.22; 'assume': 0.22; 'header :In-Reply-To:1': 0.22; 'extending': 0.23; 'referring': 0.23; 'stefan': 0.24; 'tree': 0.25; 'windows': 0.26; 'fact': 0.27; 'module.': 0.29; 'sorry,': 0.29; 'problem': 0.29; 'error': 0.29; 'keeps': 0.30; 'seem': 0.30; 'logic': 0.30; 'xml': 0.31; 'usually': 0.31; "didn't": 0.31; 'actual': 0.32; 'pretty': 0.32; 'header:User-Agent:1': 0.33; 'header:X-Complaints-To:1': 0.33; 'it?': 0.33; 'to:addr:python-list': 0.34; 'handled': 0.34; 'normally': 0.34; 'which,': 0.34; '(not': 0.35; 'something': 0.35; 'example,': 0.37; 'instead,': 0.37; 'skip:" 10': 0.37; 'but': 0.37; 'could': 0.37; 'using': 0.38; 'received:org': 0.38; 'some': 0.38; 'skip:o 20': 0.38; 'characters': 0.39; 'describe': 0.39; 'itself.': 0.39; 'received:de': 0.39; 'should': 0.39; 'being': 0.39; 'extend': 0.39; 'to:addr:python.org': 0.40; 'according': 0.61; 'your': 0.61; 'back': 0.62; 'plus': 0.66; 'exact': 0.68; 'received:188': 0.68; 'printing,': 0.77; 'and:': 0.84; 'mere': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: Python 3 - xml - crlf handling problem Date: Wed, 30 Nov 2011 14:47:59 +0100 References: <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: ppp-188-174-45-83.dynamic.mnet-online.de User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15 In-Reply-To: <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1322660900 news.xs4all.nl 6876 [2001:888:2000:d::a6]:38507 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:16433 durumdara, 30.11.2011 13:08: > As I see that XML parsing is "wrong" in Python. You didn't say what you are using for parsing, but from your example, it appears likely that you are using the xml.dom.minidom module. > I must use predefined XML files, parsing them, extending them, and > produce some result. > > But as I see that in Windows this is working wrong. > > When the predefined XMLs are "formatted" (prettied) with CRLFs, then > the parser keeps these plus LF characters (not handle the logic that > CR = LF = CRLF), and it is appearing in the new result too. I assume that you are referring to XML's newline normalisation algorithm? That should normally be handled by the parser, which, in the case of minidom, is usually expat. And I seriously doubt that expat has a problem with something as basic as newline normalisation. Did you verify that the newlines are really not being converted by the parser? From your example, I can only see that you are serialising the XML tree back into a file, which may or may not alter the line endings by itself. Instead, take a look at the text content in the tree right after parsing to see how line endings look at that level. > xo = parse('test_original.xml') > de = xo.documentElement > de.setAttribute('b', "2") > b = xo.toxml('utf-8') > f = open('test_original2.xml', 'wb') > f.write(b) > f.close() This doesn't do any pretty printing, though, in case that's what you were really after (which appears likely according to your comments). > And: if I used text elements, this can extend the information with > plus characters and make wrong xml... Sorry, I don't understand this sentence. > I can use only "myowngenerated", and not prettied xmls because of this > problem! Then what is the actual problem? Do you get an error somewhere? And if so, could you paste the exact error message and describe what you do to produce it? The mere fact that the line endings use the normal platform specific representation doesn't seem like a problem to me. Stefan