Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #105588

Re: Python to do CDC on XML files

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Bruce Kirk <bruce.kirk24@gmail.com>
Newsgroups comp.lang.python
Subject Re: Python to do CDC on XML files
Date Wed, 23 Mar 2016 19:57:12 -0400
Lines 18
Message-ID <mailman.79.1458801774.2244.python-list@python.org> (permalink)
References <833ad88a-4840-4a23-8ab3-b736068b49fe@googlegroups.com> <CAP1rxO79Rzo3tAhR9E5djkhWB79x2QrHB-+0rStW_girQumobg@mail.gmail.com>
Mime-Version 1.0 (1.0)
Content-Type text/plain; charset=us-ascii
Content-Transfer-Encoding quoted-printable
X-Trace news.uni-berlin.de dUROKxdCVmstycdlPTwSgg6028drIsO5NXoqHjrE/BUQ==
Return-Path <bruce.kirk24@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.021
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'subject:Python': 0.05; 'cc:addr:python-list': 0.09; 'agree,': 0.09; 'subject:files': 0.09; 'files.': 0.13; '2016': 0.16; '23,': 0.16; 'cc:name:python list': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; '&gt;': 0.18; 'email addr:gmail.com&gt;': 0.18; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'xml': 0.24; 'header:In-Reply-To:1': 0.24; 'compare': 0.27; 'received:192.168.10': 0.29; 'anyone': 0.32; 'message- id:@gmail.com': 0.34; 'received:google.com': 0.35; 'too': 0.36; 'should': 0.36; 'received:209.85': 0.36; 'structures': 0.36; 'volume': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'charset:us- ascii': 0.37; 'received:209': 0.38; 'files': 0.38; 'data': 0.39; 'does': 0.39; 'received:192': 0.39; 'challenge': 0.61; 'header :Message-Id:1': 0.61; 'within': 0.64; 'mar': 0.65; 'capture': 0.66; 'million': 0.74; 'same,': 0.91; 'ipad': 0.95
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CNF/bELWvzmx25WrykVp30OgMnHfjNkNt+VeHm7NeN0=; b=cV5PcENksKzeZYLK7SvP2aohg3xUgpSglK1CJvhCW+Czin8cYWsk637/oGkRaTpbhD HwIXePNQGQJHL9Pysgwuy2cAmPGh0zxmZpw5ABk8lx4mSLWlER8xdMCRpx1UBjY/K8sU 5N5oBEfiKQgJmNcfs9WYZUP6xTBpY6qWMTCSwts+NaqmY/mMe9exOc6F7RN/mkhzjng/ XUNNomilxAcwUILn/r7RmKnGM+qPOb6K6o9Ynw4j91zORLV82mUGpRAVibPLs7qwvgjx 9cm88pL2XqEXJ7+BLNbF7Be/NiM4+mfq51GJkE5l0iiJZ/1pUbTaO70YqY2/xgv6qiKi dRNg==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=CNF/bELWvzmx25WrykVp30OgMnHfjNkNt+VeHm7NeN0=; b=Lz3JlaA1Agd91QknFGgYAG1Q+9RO/hddptXen6nlRl0tZW+T989P08dwhr8DLDJx7X +QNfibUd4Pf2CPRSZoXFssKUK01YmFEmqYf8+bekENPp7oh2O/EH77lKohb0KqLJ5xX9 /DnT5abRTpv3sK2HnEnDIqlrStVD9BPOpto2h3klSUFBuusljWcC/4OEadx3hsjkQc74 wh6hn+n/9hzOABwVFxx1CLTNcD+Rge7G7pxkaHuOdrj8UvQJJIK27Wc122mXhBDhH9k+ qcfCQObBNOK3SQouYbif02i4Q/cJMYDfEW5f0r7BlN+bTwJysPL/Gfg4wq7Rmje/HVeu qGrw==
X-Gm-Message-State AD7BkJIzwluBj1b7WjNIe3dq/k3A9vrx+uaoW0ZtcdhDCBqYJUgzt6adxM7gKbXrFiK2HQ==
X-Received by 10.140.18.168 with SMTP id 37mr7048537qgf.59.1458777433879; Wed, 23 Mar 2016 16:57:13 -0700 (PDT)
X-Mailer iPad Mail (13E233)
In-Reply-To <CAP1rxO79Rzo3tAhR9E5djkhWB79x2QrHB-+0rStW_girQumobg@mail.gmail.com>
X-Mailman-Approved-At Thu, 24 Mar 2016 02:42:53 -0400
X-Content-Filtered-By Mailman/MimeDel 2.1.21
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:105588

Show key headers only | View raw


I agree, the challenge is the volume of the data to compare is 13. Million records. So it needs to be very fast

Sent from my iPad

> On Mar 23, 2016, at 4:47 PM, Bob Gailer <bgailer@gmail.com> wrote:
> 
> 
> On Mar 23, 2016 4:20 PM, "Bruce Kirk" <bruce.kirk24@gmail.com> wrote:
> >
> > Does anyone know of any existing projects on how to generate a change data capture on 2 very large xml files.
> >
> > The xml structures are the same, it is the data within the files that may differ.
> >
> It should not be too difficult to write a program that locates the tags delimiting each record, then compare them.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 13:16 -0700
  Re: Python to do CDC on XML files Bob Gailer <bgailer@gmail.com> - 2016-03-23 16:47 -0400
  Re: Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 19:57 -0400
  Re: Python to do CDC on XML files Chris Angelico <rosuav@gmail.com> - 2016-03-24 18:00 +1100
  Re: Python to do CDC on XML files Peter Otten <__peter__@web.de> - 2016-03-24 09:19 +0100

csiph-web