Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #105590
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: Python to do CDC on XML files |
| Date | Thu, 24 Mar 2016 18:00:51 +1100 |
| Lines | 18 |
| Message-ID | <mailman.81.1458802854.2244.python-list@python.org> (permalink) |
| References | <833ad88a-4840-4a23-8ab3-b736068b49fe@googlegroups.com> <CAP1rxO79Rzo3tAhR9E5djkhWB79x2QrHB-+0rStW_girQumobg@mail.gmail.com> <683FF696-8223-46FB-9A72-55839A8B4241@gmail.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| X-Trace | news.uni-berlin.de SDVHjRigxdIaj3Ts3adv0Ak14bG+mNm0bQ7xklh/l9QQ== |
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.023 |
| X-Spam-Evidence | '*H*': 0.95; '*S*': 0.00; 'subject:Python': 0.05; 'extent': 0.07; 'cc:addr:python-list': 0.09; 'agree,': 0.09; 'subject:files': 0.09; 'files.': 0.13; 'file,': 0.15; 'thu,': 0.15; '2016': 0.16; '24,': 0.16; 'cc:name:python list': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'text,': 0.22; 'am,': 0.23; 'bruce': 0.23; 'header:In-Reply-To:1': 0.24; 'sort': 0.25; 'figure': 0.27; 'checking': 0.27; 'compare': 0.27; 'least': 0.27; 'message-id:@mail.gmail.com': 0.27; 'fast.': 0.29; 'lot.': 0.29; 'periodic': 0.29; 'convert': 0.29; 'generally': 0.32; 'utility': 0.33; 'file': 0.34; 'received:google.com': 0.35; 'text': 0.35; 'easiest': 0.35; "isn't": 0.35; 'received:209.85': 0.36; 'data.': 0.36; 'volume': 0.36; 'subject:: ': 0.37; 'received:209': 0.38; 'mean': 0.38; 'data': 0.39; 'some': 0.40; 'your': 0.60; 'challenge': 0.61; 'hope': 0.61; 'information': 0.63; 'mar': 0.65; 'differences': 0.66; "today's": 0.69; 'saving': 0.70; 'records': 0.70; 'overall': 0.72; 'million': 0.74; 'change?': 0.84; 'chrisa': 0.84; 'start!': 0.84; 'to:none': 0.91; 'kirk': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=Uepeh0EZ2gIO2/VKjGOE419Phrr2V0UzQ0I7sdpDJzs=; b=iZ/hUsITVB7CPE2INCQSAMdwxRXudswW0PIxA4Op617CXriRT1D8jN3tuG8NWHbMcG CaDnsts02Rr22aXZLpHWriaokCPG5ZXUn2rpC5NYSCxhqH/TWq356LKGfLksdoCGbiEO lEMsloP4pKwPg22eDLITxYxHwsqlItuO21XnCl+H9V8kP+TmP6jPZcYdYLzTVpa89oJZ u5SUuMjZ0dc5pSolFhvX71bjuaFSqBn3QHCz3grqrS8SXutFV/FdT5vOMhiFr7HnfbcT mFdPF1boh5Jgk23+lj6ChcJLdIeLKRtI00Z3aY8IOIcTFCh5P2Su0UvuIO1VC9BnXM8s YrAw== |
| X-Google-DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=Uepeh0EZ2gIO2/VKjGOE419Phrr2V0UzQ0I7sdpDJzs=; b=RabZQdQhHTebOLTilwXS6S0ICrz21DdoJPfoYSLvRsl6ASkqR4WXoJ6RjK+ANT2AXc Xv4W0mLND2qR2t/7gZnMzhEbmaGm2/ZMph2mXmsus52/QeZro6TmbjvFwAEV9mjRLBPk uoYRuwuuriosqQImhYaeMs24U1+lnDjeef2EO41Svxric4LwRfkM9+mCMoYhEMxHUw/n FmoC0oV6lLnFdN6GgA0Lvx+8GhEGGY/ZGPKifQwN9uk7SpmTudPcyrQtlVO33kHl0nzy eXLxa8v5yY0THL9us0/okJBrQnaZjfOTULCd63IRYZaQybg2NOWPI9Xgr8cW1pymjEcg v/zg== |
| X-Gm-Message-State | AD7BkJJ8fOdOpJixRycAZJB7RSXX0fK6WfHcX9uC2W0j2EZBKPgI+13WlN10yps7LQJZqgQS2xY+nWHnyKXE+Q== |
| X-Received | by 10.25.90.21 with SMTP id o21mr2220080lfb.166.1458802851910; Thu, 24 Mar 2016 00:00:51 -0700 (PDT) |
| In-Reply-To | <683FF696-8223-46FB-9A72-55839A8B4241@gmail.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:105590 |
Show key headers only | View raw
On Thu, Mar 24, 2016 at 10:57 AM, Bruce Kirk <bruce.kirk24@gmail.com> wrote: > I agree, the challenge is the volume of the data to compare is 13. Million records. So it needs to be very fast 13M records is a good lot. To what extent can the data change? You may find it easiest to do some sort of conversion to text, throwing away any information that isn't "interesting", and then use the standard 'diff' utility to compare the text files. It's up to you to figure out what differences are "uninteresting"; it'll depend on your exact data. As long as you can do the conversion-to-text in a simple and straight-forward way, the overall operation will be reasonably fast. If this is a periodic thing (eg you're constantly checking today's file against yesterday's), saving the dumped text file will mean you generally need to just convert one file, halving your workload. This isn't a solution so much as a broad pointer... hope it's at least a start! ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 13:16 -0700 Re: Python to do CDC on XML files Bob Gailer <bgailer@gmail.com> - 2016-03-23 16:47 -0400 Re: Python to do CDC on XML files Bruce Kirk <bruce.kirk24@gmail.com> - 2016-03-23 19:57 -0400 Re: Python to do CDC on XML files Chris Angelico <rosuav@gmail.com> - 2016-03-24 18:00 +1100 Re: Python to do CDC on XML files Peter Otten <__peter__@web.de> - 2016-03-24 09:19 +0100
csiph-web