Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #18879 > unrolled thread
| Started by | Dave Angel <d@davea.name> |
|---|---|
| First post | 2012-01-12 12:02 -0500 |
| Last post | 2012-01-12 12:02 -0500 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Is there a way to merge two XML files via Python? Dave Angel <d@davea.name> - 2012-01-12 12:02 -0500
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-01-12 12:02 -0500 |
| Subject | Re: Is there a way to merge two XML files via Python? |
| Message-ID | <mailman.4687.1326387773.27778.python-list@python.org> |
On 01/12/2012 11:39 AM, Stefan Behnel wrote: > J, 12.01.2012 17:04: >> This is more a theory exercise and something I'm trying to figure out, >> and this is NOT a homework assignment... >> >> I'm trying to make a tool I use at work more efficient :) >> >> So this is at test tool that generates an XML file as it's output that >> is eventually used by a web service to display test results and system >> information. >> >> The problem is that the testing is broken down into to different runs: >> Functional and Automated where the Functional tests are all manual, >> then the automated tests are run separately, usually overnight. >> >> Each of those test runs generates essentially an identical XML file. >> What I want to learn is a way to merge them. > Ok - how large are these files? (i.e., do they easily fit into memory?) > > >> In abstract terms, the idea is essentially to diff the two files >> creating a patch and then use that patch to merge the two files into a >> single XML file. > I wouldn't go through patch. If they fit into memory, just load both, merge > one into the other eliminating duplicates, and save that. > > Or rather, load just one and process the other one incrementally using > ElementTree's iterparse(). > > >> SO what I was hoping I could get pointers on from those of you who are >> experienced in using Python with XML is what python libraries or means >> are there for working with XML files specifically, and how easy or >> difficult would this be? > Depends on how easy it is to recognise duplicates in your specific data > format. Once you've managed to do that, the rest is trivial. > > >> I'm also doing research on my own in my spare time on this, but I also >> wanted to ask here to get the opinion of developers who are more >> experienced in working with XML than I am. > I recommend looking at the stdlib xml.etree.ElementTree module or the > external lxml package (which contains the ElementTree compatible lxml.etree > module). The latter will (likely) make things easier due to full XPath > support and some other goodies, but ElementTree is also quite quick and > easy to use by itself. > > Stefan > Question for jeff: Have you tried doing it by hand? Do you know when a duplicate should be ignored, when it should be replicated, when it should be represented by incrementing a count? xml is very flexible, but the final reader of your file may not be so flexible. (e.g. if it has to match a wsdl) If two runs differ only by some timing field, then you might need to sum those times, and produce an average in the final run. Or a max value, or both. -- DaveA
Back to top | Article view | comp.lang.python
csiph-web