Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'resulting': 0.04; 'anyway.': 0.05; 'tree': 0.05; 'xml,': 0.05; 'json': 0.07; 'parser': 0.07; 'http': 0.09; 'performs': 0.09; 'wrong,': 0.09; 'api': 0.11; 'cc:addr:python-list': 0.11; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'matters,': 0.16; 'missing?': 0.16; 'query,': 0.16; 'stuff.': 0.16; 'subject:XML': 0.16; 'subject:when': 0.16; 'well-known': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'cc:addr:python.org': 0.22; 'body,': 0.24; 'club': 0.24; 'parse': 0.24; 'fairly': 0.24; 'cc:2**0': 0.24; 'push': 0.26; 'tables': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; '(this': 0.29; '[1]': 0.29; 'xml': 0.29; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; 'gives': 0.31; 'that.': 0.31; 'usually': 0.31; '(usually': 0.31; 'probably': 0.32; 'checking': 0.33; 'actual': 0.34; 'maybe': 0.34; 'basic': 0.35; 'something': 0.35; 'usual': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'data,': 0.36; 'largely': 0.36; 'doing': 0.36; 'subject:?': 0.36; 'behind': 0.37; 'example,': 0.37; 'wrong': 0.37; 'generic': 0.38; 'pm,': 0.38; 'bad': 0.39; 'extremely': 0.39; 'structure': 0.39; 'skip:u 10': 0.60; 'future': 0.60; 'negative': 0.60; 'most': 0.60; 'entire': 0.61; 'simple': 0.61; 'back': 0.62; 'levels': 0.65; 'answer.': 0.68; 'normal.': 0.68; 'online': 0.71; 'walk': 0.74; 'goal': 0.75; '2015': 0.84; 'experiment': 0.84; 'fictional': 0.84; 'frustrating': 0.84; 'to:none': 0.92; 'technique': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=Hc9JJk5TBeRJXPT01IYG1clYIInRfLlVfU/aW43/fzU=; b=x+JXTlWLCCJlCCiwlxW6zsxBj2ExS4Rd6CXdLDCtp4/qRsBavKzz3pewe/dQCnDZOv w3W8I2stleGQXKndykIjLLy+okUtG57v5XEQBBKPf8krQuKx6QyXMxDnOsxm5mZMJzKc rtnTvyXOBZ53R1DMJ2xh6y6OoclBQ5kgpzD0mW3LE5DbgzdGLMelOI0OUPBuyTyfJakl 1FwzC8E41hONHnD4Dc3GHN3muqZuTZ9dcDVWf/QmwZHsDNSkglUtdOjAK92WBt4mNLGs 0BY0Ffn+4evSrH3g+1jsTMGQRFPPHyemotermaWPY3WMeyHYpOTBjzOWJsQcEC2dDSev BdGg== MIME-Version: 1.0 X-Received: by 10.50.66.172 with SMTP id g12mr923638igt.34.1430819078911; Tue, 05 May 2015 02:44:38 -0700 (PDT) In-Reply-To: References: Date: Tue, 5 May 2015 19:44:38 +1000 Subject: Re: Is it normal to cry when given XML? From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 35 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1430819082 news.xs4all.nl 2923 [2001:888:2000:d::a6]:33879 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:89952 On Tue, May 5, 2015 at 7:28 PM, Sayth Renshaw wrote: > Hi > > Just checking if the reaction to cry when given XML is normal. It's not unsurprising, especially with bad XML structures. > I thought maybe I am approaching it all wrong, using lxml largely or some xquery to club it into submission. > > See the usual goal is just to take the entire XML and push it into a database. or in future experiment with Mongo or Hdf5 . > > See its never basic xml, usually comes from some database with a walk of tables and strange relationships. > > Am I doing it wrong is there a simple way I am missing? Generally, I work with XML only as a transport layer; and most of the time, it's for a document structure that would be better served by JSON anyway. (This may mean that I have an unfairly negative view of XML, but it's extremely common.) My usual technique is to parse it into something native (usually a dictionary - and probably the same structure that the other end constructed the XML from), then work with that. For example, querying the ePond API [1] gives back a pile of XML data, so I might have a single function that performs a synchronous HTTP query, takes the response body, parses it using a fairly generic XML parser like lxml, then digs three levels into the resulting tree to pull out the bit that actually matters, leaving behind all the framing and stuff. The less time you spend with actual XML, the better. XML is not the answer. ChrisA [1] A completely fictional web site, of course, and in no way implying that I have had a frustrating time with a well-known online sales/auction company.