Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #18879
| Date | 2012-01-12 12:02 -0500 |
|---|---|
| From | Dave Angel <d@davea.name> |
| Subject | Re: Is there a way to merge two XML files via Python? |
| References | <CAFB6qZvnBSLzZ7gRSxSnSm6Dj8cTr+bpMQOuNTO3CnSdMDLXfQ@mail.gmail.com> <jen2ba$2kq$1@dough.gmane.org> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.4687.1326387773.27778.python-list@python.org> (permalink) |
On 01/12/2012 11:39 AM, Stefan Behnel wrote: > J, 12.01.2012 17:04: >> This is more a theory exercise and something I'm trying to figure out, >> and this is NOT a homework assignment... >> >> I'm trying to make a tool I use at work more efficient :) >> >> So this is at test tool that generates an XML file as it's output that >> is eventually used by a web service to display test results and system >> information. >> >> The problem is that the testing is broken down into to different runs: >> Functional and Automated where the Functional tests are all manual, >> then the automated tests are run separately, usually overnight. >> >> Each of those test runs generates essentially an identical XML file. >> What I want to learn is a way to merge them. > Ok - how large are these files? (i.e., do they easily fit into memory?) > > >> In abstract terms, the idea is essentially to diff the two files >> creating a patch and then use that patch to merge the two files into a >> single XML file. > I wouldn't go through patch. If they fit into memory, just load both, merge > one into the other eliminating duplicates, and save that. > > Or rather, load just one and process the other one incrementally using > ElementTree's iterparse(). > > >> SO what I was hoping I could get pointers on from those of you who are >> experienced in using Python with XML is what python libraries or means >> are there for working with XML files specifically, and how easy or >> difficult would this be? > Depends on how easy it is to recognise duplicates in your specific data > format. Once you've managed to do that, the rest is trivial. > > >> I'm also doing research on my own in my spare time on this, but I also >> wanted to ask here to get the opinion of developers who are more >> experienced in working with XML than I am. > I recommend looking at the stdlib xml.etree.ElementTree module or the > external lxml package (which contains the ElementTree compatible lxml.etree > module). The latter will (likely) make things easier due to full XPath > support and some other goodies, but ElementTree is also quite quick and > easy to use by itself. > > Stefan > Question for jeff: Have you tried doing it by hand? Do you know when a duplicate should be ignored, when it should be replicated, when it should be represented by incrementing a count? xml is very flexible, but the final reader of your file may not be so flexible. (e.g. if it has to match a wsdl) If two runs differ only by some timing field, then you might need to sum those times, and produce an average in the final run. Or a max value, or both. -- DaveA
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Is there a way to merge two XML files via Python? Dave Angel <d@davea.name> - 2012-01-12 12:02 -0500
csiph-web