Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #105591

Re: Python to do CDC on XML files

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: Python to do CDC on XML files
Date Thu, 24 Mar 2016 09:19:32 +0100
Organization None
Lines 26
Message-ID <mailman.82.1458807583.2244.python-list@python.org> (permalink)
References <833ad88a-4840-4a23-8ab3-b736068b49fe@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Trace news.uni-berlin.de GiZW561FzRVgQYSVkuKuBQsKNX9ew5gwx+EIaPrsZxjA==
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.05; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:files': 0.09; ':-)': 0.12; 'files.': 0.13; 'output': 0.13; 'result.': 0.15; 'do)': 0.16; 'formats,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'changes': 0.20; 'trying': 0.22; 'bruce': 0.23; 'xml': 0.24; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'compare': 0.27; 'sequence': 0.27; 'yesterday': 0.27; 'node': 0.29; 'convert': 0.29; 'anyone': 0.32; 'control,': 0.33; 'file': 0.34; 'done': 0.35; 'instead': 0.36; 'there': 0.36; 'url:org': 0.36; 'structures': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'log': 0.38; 'files': 0.38; 'google': 0.39; 'data': 0.39; 'does': 0.39; 'application': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'software': 0.40; 'your': 0.60; 'within': 0.64; 'today': 0.65; 'capture': 0.66; 'offer': 0.66; 'subject': 0.70; 'products.': 0.70; 'records': 0.70; 'kirk': 0.91; 'same,': 0.91
X-Injected-Via-Gmane http://gmane.org/
X-Gmane-NNTP-Posting-Host p57bd8475.dip0.t-ipconnect.de
User-Agent KNode/4.13.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:105591

Show key headers only | View raw


Bruce Kirk wrote:

> Does anyone know of any existing projects on how to generate a change data
> capture on 2 very large xml files.
> 
> The xml structures are the same, it is the data within the files that may
> differ.
> 
> I need to take a XML file from yesterday and compare it to the XML file
> produced today and not which XML records have changed.
> 
> I have done a google search and I am not able to find much on the subject
> other than software vendors trying to sell me their products. :-)

There is

http://www.logilab.org/project/xmldiff

As an alternative you may try to log the changes as they occur instead of 
inspecting the result. If the application generating the file is not under 
your control, does it offer other output formats, e. g. csv?

Or if the xml file is basically a sequence of one type of node you may 
convert it to a database (sqlite will do) to match and compare the 
"records".

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 13:16 -0700
  Re: Python to do CDC on XML files Bob Gailer <bgailer@gmail.com> - 2016-03-23 16:47 -0400
  Re: Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 19:57 -0400
  Re: Python to do CDC on XML files Chris Angelico <rosuav@gmail.com> - 2016-03-24 18:00 +1100
  Re: Python to do CDC on XML files Peter Otten <__peter__@web.de> - 2016-03-24 09:19 +0100

csiph-web