Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #16433

Re: Python 3 - xml - crlf handling problem

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python.': 0.04; 'parser': 0.05; 'predefined': 0.05; 'subject:Python': 0.05; 'subject:xml': 0.07; 'alter': 0.09; 'newline': 0.09; 'parsing': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'expat': 0.15; 'elements,': 0.16; 'expat.': 0.16; 'f.close()': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'newlines': 0.16; 'parser,': 0.16; 'parser?': 0.16; 'problem!': 0.16; 'received:188.174': 0.16; 'received:mnet-online.de': 0.16; 'somewhere?': 0.16; 'subject:handling': 0.16; 'converted': 0.18; 'problem?': 0.18; 'file,': 0.19; 'subject:problem': 0.19; '(which': 0.19; 'appears': 0.19; 'result.': 0.21; "doesn't": 0.22; 'assume': 0.22; 'header :In-Reply-To:1': 0.22; 'extending': 0.23; 'referring': 0.23; 'stefan': 0.24; 'tree': 0.25; 'windows': 0.26; 'fact': 0.27; 'module.': 0.29; 'sorry,': 0.29; 'problem': 0.29; 'error': 0.29; 'keeps': 0.30; 'seem': 0.30; 'logic': 0.30; 'xml': 0.31; 'usually': 0.31; "didn't": 0.31; 'actual': 0.32; 'pretty': 0.32; 'header:User-Agent:1': 0.33; 'header:X-Complaints-To:1': 0.33; 'it?': 0.33; 'to:addr:python-list': 0.34; 'handled': 0.34; 'normally': 0.34; 'which,': 0.34; '(not': 0.35; 'something': 0.35; 'example,': 0.37; 'instead,': 0.37; 'skip:" 10': 0.37; 'but': 0.37; 'could': 0.37; 'using': 0.38; 'received:org': 0.38; 'some': 0.38; 'skip:o 20': 0.38; 'characters': 0.39; 'describe': 0.39; 'itself.': 0.39; 'received:de': 0.39; 'should': 0.39; 'being': 0.39; 'extend': 0.39; 'to:addr:python.org': 0.40; 'according': 0.61; 'your': 0.61; 'back': 0.62; 'plus': 0.66; 'exact': 0.68; 'received:188': 0.68; 'printing,': 0.77; 'and:': 0.84; 'mere': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Stefan Behnel <stefan_ml@behnel.de>
Subject Re: Python 3 - xml - crlf handling problem
Date Wed, 30 Nov 2011 14:47:59 +0100
References <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host ppp-188-174-45-83.dynamic.mnet-online.de
User-Agent Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15
In-Reply-To <3aae0b18-a194-444f-a2fc-da156204bd95@20g2000yqa.googlegroups.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3158.1322660900.27778.python-list@python.org> (permalink)
Lines 56
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1322660900 news.xs4all.nl 6876 [2001:888:2000:d::a6]:38507
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:16433

Show key headers only | View raw


durumdara, 30.11.2011 13:08:
> As I see that XML parsing is "wrong" in Python.

You didn't say what you are using for parsing, but from your example, it 
appears likely that you are using the xml.dom.minidom module.


> I must use predefined XML files, parsing them, extending them, and
> produce some result.
>
> But as I see that in Windows this is working wrong.
>
> When the predefined XMLs are "formatted" (prettied) with CRLFs, then
> the parser keeps these plus LF characters (not handle the logic that
> CR = LF = CRLF), and it is appearing in the new result too.

I assume that you are referring to XML's newline normalisation algorithm? 
That should normally be handled by the parser, which, in the case of 
minidom, is usually expat. And I seriously doubt that expat has a problem 
with something as basic as newline normalisation.

Did you verify that the newlines are really not being converted by the 
parser? From your example, I can only see that you are serialising the XML 
tree back into a file, which may or may not alter the line endings by 
itself. Instead, take a look at the text content in the tree right after 
parsing to see how line endings look at that level.


>      xo = parse('test_original.xml')
>      de = xo.documentElement
>      de.setAttribute('b', "2")
>      b = xo.toxml('utf-8')
>      f = open('test_original2.xml', 'wb')
>      f.write(b)
>      f.close()

This doesn't do any pretty printing, though, in case that's what you were 
really after (which appears likely according to your comments).


> And: if I used text elements, this can extend the information with
> plus characters and make wrong xml...

Sorry, I don't understand this sentence.


> I can use only "myowngenerated", and not prettied xmls because of this
> problem!

Then what is the actual problem? Do you get an error somewhere? And if so, 
could you paste the exact error message and describe what you do to produce 
it? The mere fact that the line endings use the normal platform specific 
representation doesn't seem like a problem to me.

Stefan

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Python 3 - xml - crlf handling problem durumdara <durumdara@gmail.com> - 2011-11-30 04:08 -0800
  Re: Python 3 - xml - crlf handling problem Stefan Behnel <stefan_ml@behnel.de> - 2011-11-30 14:47 +0100
    Re: Python 3 - xml - crlf handling problem durumdara <durumdara@gmail.com> - 2011-12-02 00:13 -0800
      Re: Python 3 - xml - crlf handling problem Stefan Behnel <stefan_ml@behnel.de> - 2011-12-02 12:23 +0100

csiph-web