Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Saran Ahluwalia Newsgroups: comp.lang.python Subject: Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Date: Sat, 9 Jan 2016 18:13:29 -0500 Lines: 176 Message-ID: References: <569184DA.4080009@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de OaIRyA9y7R+/ykSu2L8a1A3ShiMh66I6LXFZKZ9j+wqQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'handler': 0.04; 'root': 0.04; 'subject:Python': 0.05; 'sys': 0.05; '"__main__":': 0.07; '-*-': 0.07; '__name__': 0.07; 'column': 0.07; 'data):': 0.07; 'data:': 0.07; 'escape': 0.07; 'formatting': 0.07; 'sanity': 0.07; 'utf-8': 0.07; 'cc:addr:python-list': 0.09; 'coding:': 0.09; 'csv': 0.09; 'delimiter': 0.09; 'ide': 0.09; 'name):': 0.09; 'os.path': 0.09; 'script,': 0.09; 'stringio': 0.09; 'subject:string': 0.09; 'subject:using': 0.09; 'utilizing': 0.09; 'python': 0.10; 'jan': 0.11; 'syntax': 0.13; 'def': 0.13; '>>>': 0.15; "skip:' 30": 0.15; 'subject: \n ': 0.15; '(line': 0.16; '2016': 0.16; 'cc:name:python': 0.16; 'escapes': 0.16; 'from:addr:ahlusar.ahluwalia': 0.16; 'from:name:saran ahluwalia': 0.16; 'guidance.': 0.16; 'ideally,': 0.16; 'idx': 0.16; 'idx,': 0.16; 'previously,': 0.16; 'pythonic': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'row': 0.16; 'self)': 0.16; 'skip:r 130': 0.16; 'subject:XML': 0.16; 'syntaxerror:': 0.16; 'variations': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.16; 'string': 0.17; 'element': 0.18; 'try:': 0.18; '>': 0.18; '(in': 0.18; 'email addr:gmail.com>': 0.18; 'variable': 0.18; 'thanks.': 0.18; '>>>': 0.20; 'windows': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'so.': 0.22; '"",': 0.22; 'assign': 0.22; 'parse': 0.22; 'pass': 0.22; 'seems': 0.23; '8bit%:5': 0.23; 'sat,': 0.23; 'this:': 0.23; 'tried': 0.24; 'import': 0.24; '(this': 0.24; 'xml': 0.24; 'header:In-Reply- To:1': 0.24; 'wondering': 0.25; 'example': 0.26; '(which': 0.26; 'skip:" 20': 0.26; 'skip:_ 20': 0.26; 'skip:# 10': 0.27; 'message- id:@mail.gmail.com': 0.27; 'skip:e 30': 0.27; 'yield': 0.27; 'correct': 0.28; "skip:' 10": 0.28; 'actual': 0.28; '(perhaps': 0.29; 'quoting': 0.29; 'skip:/ 80': 0.29; 'skip:2 30': 0.29; 'skip:q 20': 0.29; 'skip:r 50': 0.29; 'skip:r 60': 0.29; 'whitespace': 0.29; 'character': 0.29; 'raise': 0.29; 'print': 0.30; 'skip:& 30': 0.30; 'error.': 0.31; 'skip:_ 10': 0.32; 'class': 0.33; 'scalable': 0.33; 'skip:/ 20': 0.33; 'utility': 0.33; "skip:' 20": 0.34; 'file': 0.34; 'except': 0.34; 'skip:& 20': 0.35; 'received:google.com': 0.35; 'attempt': 0.35; 'follows:': 0.35; 'unknown': 0.35; 'received:74.125.82': 0.35; 'comment': 0.35; 'but': 0.36; 'there': 0.36; 'assigned': 0.36; 'skip:m 40': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'skip:& 10': 0.37; 'method': 0.37; 'beyond': 0.37; 'feedback': 0.38; 'thank': 0.38; 'means': 0.39; 'data': 0.39; 'skip:e 20': 0.39; 'skip:x 10': 0.40; 'some': 0.40; 'your': 0.60; 'address': 0.61; 'skip:u 10': 0.61; 'information': 0.63; 'due': 0.65; 'here': 0.66; 'skip:\xc2 10': 0.67; 'receive': 0.71; '*data': 0.84; '0000': 0.84; "everything's": 0.84; 'specs:': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=4ALXJNF49o3eX8g4omRYO2seP15UMtpjb8GMQNosB2Y=; b=FAs+FdKFfTqdk0ywM27HELyC1ZpZDTBKQtYzXP6E0HT75bpHmpwic6wIEhJHo3GJBF 1HSlEeLoI+Wh1OEVpFwsgdmdVNKCT6I1FG8o29tMyk/Fd4+jagMHzKRT2mvp9PRwbJvP +dOnH6zIi//Ps5Absp8dT8V8WwXYQ78UoFOd5irBlxwPbEbHa9o6PyTN23L3Zx6/1yWc VufsB/e7vgcOoaioEQvGIfQwazg5WNJ8Py+GLy+nnAa/xpO6ToCeAnpOC+w3fnXONIQr M0MJi0ZY/himH99nQqxKVOk6gfuVxXJikrk2+FQoJJaz0xa4qriovU//LOJ9/GDpLCXs 0KTA== X-Received: by 10.194.21.101 with SMTP id u5mr130393370wje.53.1452381228424; Sat, 09 Jan 2016 15:13:48 -0800 (PST) In-Reply-To: <569184DA.4080009@gmail.com> X-Content-Filtered-By: Mailman/MimeDel 2.1.20+ X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:101418 As mentioned previously, I assigned the variable *data *to the string ''0000',0001,0070,". When any utility that attempts to parse this string (this is one example of millions found in my actual data source) you receive that error. You would need to comment out the following: data = row['E'] print(data) # The before data = data[1:-1] data = ""'{}'"".format(data) print(data) # Sanity check to achieve the readout. Unfortunately, I am wondering if there is a scalable solution to the above issue (perhaps using some form of escape character or regex?). I have ideas and have tried many but to no avail. There always seems to be an edge case that escapes me. Thanks. On Sat, Jan 9, 2016 at 5:08 PM, Karim wrote: > > > On 09/01/2016 21:54, kbtyo wrote: > >> My specs: >> >> Python 3.4.3 >> Windows 7 >> IDE is Jupyter Notebooks >> >> What I have referenced: >> >> 1) >> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml >> >> 2) >> >> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes >> >> 3) >> http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python >> >> >> Here is the data (in CSV format) and script, respectively, (I have tried >> variations on serializing Column 'E' using both Sax and ElementTree): >> >> i) >> >> A,B,C,D,E,F,G,H,I,J >> "3","8","1","> />2312> />285SChecking10","> TransactionID="2" >> RequestType="HoldInquiry">0000',0001,0070,","1967-12-25 >> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0" >> >> ii) >> >> #!/usr/bin/python >> # -*- coding: utf-8 -*- >> import os.path >> import sys >> import csv >> from io import StringIO >> import xml.etree.cElementTree as ElementTree >> from xml.etree.ElementTree import XMLParser >> import xml >> import xml.sax >> from xml.sax import ContentHandler >> >> class MyHandler(xml.sax.handler.ContentHandler): >> def __init__(self): >> self._charBuffer = [] >> self._result = [] >> >> def _getCharacterData(self): >> data = ''.join(self._charBuffer).strip() >> self._charBuffer = [] >> return data.strip() #remove strip() if whitespace is important >> >> def parse(self, f): >> xml.sax.parse(f, self) >> return self._result >> >> def characters(self, data): >> self._charBuffer.append(data) >> >> def startElement(self, name, attrs): >> if name == 'Response': >> self._result.append({}) >> >> def endElement(self, name): >> if not name == 'Response': self._result[-1][name] = >> self._getCharacterData() >> >> def read_data(path): >> with open(path, 'rU', encoding='utf-8') as data: >> reader = csv.DictReader(data, delimiter =',', quotechar="'", >> skipinitialspace=True) >> for row in reader: >> yield row >> >> if __name__ == "__main__": >> empty = '' >> Response = 'sample.csv' >> for idx, row in enumerate(read_data(Response)): >> if idx > 10: break >> data = row['E'] >> print(data) # The before >> data = data[1:-1] >> data = ""'{}'"".format(data) >> print(data) # Sanity check >> # data = '> RequestType="HoldInquiry">0000',0001,0070,' >> try: >> root = ElementTree.XML(data) >> # print(root) >> except StopIteration: >> raise >> pass >> # xmlstring = StringIO(data) >> # print(xmlstring) >> # Handler = MyHandler().parse(xmlstring) >> >> >> Specifically, due to the quoting in the CSV file (which is beyond my >> control), I have had to resort to slicing the string (line 51) and then >> formatting it (line 52). >> >> However the print out from the above attempt is as follows: >> >> "0000' >> 0000 >> >> File "", line unknown >> ParseError: no element found: line 1, column 69 >> Interestingly - if I assign the variable "data" (as in line 54) I receive >> this: >> >> File "", line 56 >> data = '> RequestType="HoldInquiry">0000',0001,0070,' >> ^ >> SyntaxError: invalid token >> >> I seek feedback and information on how to address utilizing the most >> Pythonic means to do so. Ideally, is there a method that can leverage >> ElementTree. Thank you, in advance, for your feedback and guidance. >> > > I don't understand because this line 54 gives: > > >>> import xml.etree.cElementTree as ElementTree > >>> data = ' RequestType="HoldInquiry">0000',0001,0070,' > File "", line 1 > data = ' RequestType="HoldInquiry">0000',0001,0070,' > ^ > SyntaxError: invalid syntax > > > BUT IF you correct the string and remove the inner quote after 0000 > everything's fine: > >>> data = ' RequestType="HoldInquiry">0000,0001,0070, > ' > >>> root = ElementTree.XML(data) > >>> root > > Karim > >