Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Python to do CDC on XML files Date: Thu, 24 Mar 2016 09:19:32 +0100 Organization: None Lines: 26 Message-ID: References: <833ad88a-4840-4a23-8ab3-b736068b49fe@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de GiZW561FzRVgQYSVkuKuBQsKNX9ew5gwx+EIaPrsZxjA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.05; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:files': 0.09; ':-)': 0.12; 'files.': 0.13; 'output': 0.13; 'result.': 0.15; 'do)': 0.16; 'formats,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'changes': 0.20; 'trying': 0.22; 'bruce': 0.23; 'xml': 0.24; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'compare': 0.27; 'sequence': 0.27; 'yesterday': 0.27; 'node': 0.29; 'convert': 0.29; 'anyone': 0.32; 'control,': 0.33; 'file': 0.34; 'done': 0.35; 'instead': 0.36; 'there': 0.36; 'url:org': 0.36; 'structures': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'log': 0.38; 'files': 0.38; 'google': 0.39; 'data': 0.39; 'does': 0.39; 'application': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'software': 0.40; 'your': 0.60; 'within': 0.64; 'today': 0.65; 'capture': 0.66; 'offer': 0.66; 'subject': 0.70; 'products.': 0.70; 'records': 0.70; 'kirk': 0.91; 'same,': 0.91 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8475.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:105591 Bruce Kirk wrote: > Does anyone know of any existing projects on how to generate a change data > capture on 2 very large xml files. > > The xml structures are the same, it is the data within the files that may > differ. > > I need to take a XML file from yesterday and compare it to the XML file > produced today and not which XML records have changed. > > I have done a google search and I am not able to find much on the subject > other than software vendors trying to sell me their products. :-) There is http://www.logilab.org/project/xmldiff As an alternative you may try to log the changes as they occur instead of inspecting the result. If the application generating the file is not under your control, does it offer other output formats, e. g. csv? Or if the xml file is basically a sequence of one type of node you may convert it to a database (sqlite will do) to match and compare the "records".