Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Bruce Kirk Newsgroups: comp.lang.python Subject: Re: Python to do CDC on XML files Date: Wed, 23 Mar 2016 19:57:12 -0400 Lines: 18 Message-ID: References: <833ad88a-4840-4a23-8ab3-b736068b49fe@googlegroups.com> Mime-Version: 1.0 (1.0) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de dUROKxdCVmstycdlPTwSgg6028drIsO5NXoqHjrE/BUQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.021 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'subject:Python': 0.05; 'cc:addr:python-list': 0.09; 'agree,': 0.09; 'subject:files': 0.09; 'files.': 0.13; '2016': 0.16; '23,': 0.16; 'cc:name:python list': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; '>': 0.18; 'email addr:gmail.com>': 0.18; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'xml': 0.24; 'header:In-Reply-To:1': 0.24; 'compare': 0.27; 'received:192.168.10': 0.29; 'anyone': 0.32; 'message- id:@gmail.com': 0.34; 'received:google.com': 0.35; 'too': 0.36; 'should': 0.36; 'received:209.85': 0.36; 'structures': 0.36; 'volume': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'charset:us- ascii': 0.37; 'received:209': 0.38; 'files': 0.38; 'data': 0.39; 'does': 0.39; 'received:192': 0.39; 'challenge': 0.61; 'header :Message-Id:1': 0.61; 'within': 0.64; 'mar': 0.65; 'capture': 0.66; 'million': 0.74; 'same,': 0.91; 'ipad': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CNF/bELWvzmx25WrykVp30OgMnHfjNkNt+VeHm7NeN0=; b=cV5PcENksKzeZYLK7SvP2aohg3xUgpSglK1CJvhCW+Czin8cYWsk637/oGkRaTpbhD HwIXePNQGQJHL9Pysgwuy2cAmPGh0zxmZpw5ABk8lx4mSLWlER8xdMCRpx1UBjY/K8sU 5N5oBEfiKQgJmNcfs9WYZUP6xTBpY6qWMTCSwts+NaqmY/mMe9exOc6F7RN/mkhzjng/ XUNNomilxAcwUILn/r7RmKnGM+qPOb6K6o9Ynw4j91zORLV82mUGpRAVibPLs7qwvgjx 9cm88pL2XqEXJ7+BLNbF7Be/NiM4+mfq51GJkE5l0iiJZ/1pUbTaO70YqY2/xgv6qiKi dRNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CNF/bELWvzmx25WrykVp30OgMnHfjNkNt+VeHm7NeN0=; b=Lz3JlaA1Agd91QknFGgYAG1Q+9RO/hddptXen6nlRl0tZW+T989P08dwhr8DLDJx7X +QNfibUd4Pf2CPRSZoXFssKUK01YmFEmqYf8+bekENPp7oh2O/EH77lKohb0KqLJ5xX9 /DnT5abRTpv3sK2HnEnDIqlrStVD9BPOpto2h3klSUFBuusljWcC/4OEadx3hsjkQc74 wh6hn+n/9hzOABwVFxx1CLTNcD+Rge7G7pxkaHuOdrj8UvQJJIK27Wc122mXhBDhH9k+ qcfCQObBNOK3SQouYbif02i4Q/cJMYDfEW5f0r7BlN+bTwJysPL/Gfg4wq7Rmje/HVeu qGrw== X-Gm-Message-State: AD7BkJIzwluBj1b7WjNIe3dq/k3A9vrx+uaoW0ZtcdhDCBqYJUgzt6adxM7gKbXrFiK2HQ== X-Received: by 10.140.18.168 with SMTP id 37mr7048537qgf.59.1458777433879; Wed, 23 Mar 2016 16:57:13 -0700 (PDT) X-Mailer: iPad Mail (13E233) In-Reply-To: X-Mailman-Approved-At: Thu, 24 Mar 2016 02:42:53 -0400 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:105588 I agree, the challenge is the volume of the data to compare is 13. Million r= ecords. So it needs to be very fast Sent from my iPad > On Mar 23, 2016, at 4:47 PM, Bob Gailer wrote: >=20 >=20 > On Mar 23, 2016 4:20 PM, "Bruce Kirk" wrote: > > > > Does anyone know of any existing projects on how to generate a change da= ta capture on 2 very large xml files. > > > > The xml structures are the same, it is the data within the files that ma= y differ. > > > It should not be too difficult to write a program that locates the tags de= limiting each record, then compare them.