Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101412 > unrolled thread

Understanding how to quote XML string in order to serialize using Python's ElementTree

Started bykbtyo <ahlusar.ahluwalia@gmail.com>
First post2016-01-09 12:54 -0800
Last post2016-01-10 00:30 +0100
Articles 6 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Understanding how to quote XML string in order to serialize using Python's ElementTree kbtyo <ahlusar.ahluwalia@gmail.com> - 2016-01-09 12:54 -0800
    Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-09 23:08 +0100
    Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-09 23:23 +0100
    Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2016-01-09 18:13 -0500
    Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2016-01-09 18:15 -0500
    Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-10 00:30 +0100

#101412 — Understanding how to quote XML string in order to serialize using Python's ElementTree

Fromkbtyo <ahlusar.ahluwalia@gmail.com>
Date2016-01-09 12:54 -0800
SubjectUnderstanding how to quote XML string in order to serialize using Python's ElementTree
Message-ID<d0a2acdb-857c-47c5-a28d-422a8fc4cc74@googlegroups.com>
My specs:

Python 3.4.3
Windows 7
IDE is Jupyter Notebooks

What I have referenced:

1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml

2)
http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes

3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python


Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):

i)

A,B,C,D,E,F,G,H,I,J
"3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"

ii)

#!/usr/bin/python
# -*-  coding: utf-8 -*-
import os.path
import sys
import csv
from io import StringIO 
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import xml
import xml.sax
from xml.sax import ContentHandler

class MyHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self._charBuffer = []
        self._result = []

    def _getCharacterData(self):
        data = ''.join(self._charBuffer).strip()
        self._charBuffer = []
        return data.strip() #remove strip() if whitespace is important

    def parse(self, f):
        xml.sax.parse(f, self)
        return self._result
    

    def characters(self, data):
        self._charBuffer.append(data)

    def startElement(self, name, attrs):
        if name == 'Response':
            self._result.append({})

    def endElement(self, name):
        if not name == 'Response': self._result[-1][name] = self._getCharacterData()

def read_data(path):
    with open(path, 'rU', encoding='utf-8') as data:
        reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
        for row in reader:
            yield row

if __name__ == "__main__":
    empty = ''
    Response = 'sample.csv'
    for idx, row in enumerate(read_data(Response)):
        if idx > 10: break
        data = row['E']
        print(data) # The before
        data = data[1:-1]
        data = ""'{}'"".format(data)
        print(data) # Sanity check 
#         data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
        try:
            root = ElementTree.XML(data)
#             print(root)
        except StopIteration:
            raise
            pass
#         xmlstring = StringIO(data)
#         print(xmlstring)
#         Handler = MyHandler().parse(xmlstring)


Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).

However the print out from the above attempt is as follows:

"<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000

  File "<string>", line unknown
ParseError: no element found: line 1, column 69
Interestingly - if I assign the variable "data" (as in line 54) I receive this:

  File "<ipython-input-80-7357c9272b92>", line 56
data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
                     ^
SyntaxError: invalid token

I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.

[toc] | [next] | [standalone]


#101414

FromKarim <kliateni@gmail.com>
Date2016-01-09 23:08 +0100
Message-ID<mailman.100.1452377311.2305.python-list@python.org>
In reply to#101412

On 09/01/2016 21:54, kbtyo wrote:
> My specs:
>
> Python 3.4.3
> Windows 7
> IDE is Jupyter Notebooks
>
> What I have referenced:
>
> 1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
> 2)
> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
> 3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
> Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):
>
> i)
>
> A,B,C,D,E,F,G,H,I,J
> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
> ii)
>
> #!/usr/bin/python
> # -*-  coding: utf-8 -*-
> import os.path
> import sys
> import csv
> from io import StringIO
> import xml.etree.cElementTree as ElementTree
> from xml.etree.ElementTree import XMLParser
> import xml
> import xml.sax
> from xml.sax import ContentHandler
>
> class MyHandler(xml.sax.handler.ContentHandler):
>      def __init__(self):
>          self._charBuffer = []
>          self._result = []
>
>      def _getCharacterData(self):
>          data = ''.join(self._charBuffer).strip()
>          self._charBuffer = []
>          return data.strip() #remove strip() if whitespace is important
>
>      def parse(self, f):
>          xml.sax.parse(f, self)
>          return self._result
>      
>
>      def characters(self, data):
>          self._charBuffer.append(data)
>
>      def startElement(self, name, attrs):
>          if name == 'Response':
>              self._result.append({})
>
>      def endElement(self, name):
>          if not name == 'Response': self._result[-1][name] = self._getCharacterData()
>
> def read_data(path):
>      with open(path, 'rU', encoding='utf-8') as data:
>          reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
>          for row in reader:
>              yield row
>
> if __name__ == "__main__":
>      empty = ''
>      Response = 'sample.csv'
>      for idx, row in enumerate(read_data(Response)):
>          if idx > 10: break
>          data = row['E']
>          print(data) # The before
>          data = data[1:-1]
>          data = ""'{}'"".format(data)
>          print(data) # Sanity check
> #         data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>          try:
>              root = ElementTree.XML(data)
> #             print(root)
>          except StopIteration:
>              raise
>              pass
> #         xmlstring = StringIO(data)
> #         print(xmlstring)
> #         Handler = MyHandler().parse(xmlstring)
>
>
> Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).
>
> However the print out from the above attempt is as follows:
>
> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>
>    File "<string>", line unknown
> ParseError: no element found: line 1, column 69
> Interestingly - if I assign the variable "data" (as in line 54) I receive this:
>
>    File "<ipython-input-80-7357c9272b92>", line 56
> data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>                       ^
> SyntaxError: invalid token
>
> I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.

I don't understand because this line 54 gives:

 >>> import xml.etree.cElementTree as ElementTree
 >>> data = '<Response TransactionID="2" 
RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
   File "<stdin>", line 1
     data = '<Response TransactionID="2" 
RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
^
SyntaxError: invalid syntax


BUT IF you correct the string and remove the inner quote after 0000 
everything's fine:
 >>> data = '<Response TransactionID="2" 
RequestType="HoldInquiry"><ShareList>0000,0001,0070, 
</ShareList></Response>'
 >>> root = ElementTree.XML(data)
 >>> root
<Element 'Response' at 0x7f0fb6dce330>
Karim

[toc] | [prev] | [next] | [standalone]


#101415

FromKarim <kliateni@gmail.com>
Date2016-01-09 23:23 +0100
Message-ID<mailman.101.1452378242.2305.python-list@python.org>
In reply to#101412

On 09/01/2016 21:54, kbtyo wrote:
> My specs:
>
> Python 3.4.3
> Windows 7
> IDE is Jupyter Notebooks
>
> What I have referenced:
>
> 1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
> 2)
> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
> 3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
> Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):
>
> i)
>
> A,B,C,D,E,F,G,H,I,J
> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
> ii)
>
> #!/usr/bin/python
> # -*-  coding: utf-8 -*-
> import os.path
> import sys
> import csv
> from io import StringIO
> import xml.etree.cElementTree as ElementTree
> from xml.etree.ElementTree import XMLParser
> import xml
> import xml.sax
> from xml.sax import ContentHandler
>
> class MyHandler(xml.sax.handler.ContentHandler):
>      def __init__(self):
>          self._charBuffer = []
>          self._result = []
>
>      def _getCharacterData(self):
>          data = ''.join(self._charBuffer).strip()
>          self._charBuffer = []
>          return data.strip() #remove strip() if whitespace is important
>
>      def parse(self, f):
>          xml.sax.parse(f, self)
>          return self._result
>      
>
>      def characters(self, data):
>          self._charBuffer.append(data)
>
>      def startElement(self, name, attrs):
>          if name == 'Response':
>              self._result.append({})
>
>      def endElement(self, name):
>          if not name == 'Response': self._result[-1][name] = self._getCharacterData()
>
> def read_data(path):
>      with open(path, 'rU', encoding='utf-8') as data:
>          reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
>          for row in reader:
>              yield row
>
> if __name__ == "__main__":
>      empty = ''
>      Response = 'sample.csv'
>      for idx, row in enumerate(read_data(Response)):
>          if idx > 10: break
>          data = row['E']
>          print(data) # The before
>          data = data[1:-1]
>          data = ""'{}'"".format(data)
>          print(data) # Sanity check
> #         data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>          try:
>              root = ElementTree.XML(data)
> #             print(root)
>          except StopIteration:
>              raise
>              pass
> #         xmlstring = StringIO(data)
> #         print(xmlstring)
> #         Handler = MyHandler().parse(xmlstring)
>
>
> Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).
>
> However the print out from the above attempt is as follows:
>
> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>
>    File "<string>", line unknown
> ParseError: no element found: line 1, column 69
> Interestingly - if I assign the variable "data" (as in line 54) I receive this:
>
>    File "<ipython-input-80-7357c9272b92>", line 56
> data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>                       ^
> SyntaxError: invalid token
>
> I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.

In  fact to get rid of double quote simply create your csv reader like that:

reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)

You should then don't need to slice data variable and reformat it.

Karim

[toc] | [prev] | [next] | [standalone]


#101418

FromSaran Ahluwalia <ahlusar.ahluwalia@gmail.com>
Date2016-01-09 18:13 -0500
Message-ID<mailman.103.1452381230.2305.python-list@python.org>
In reply to#101412
As mentioned previously, I assigned the variable *data *to the string
''<Response
TransactionID="2" RequestType="HoldInquiry"><Sha
reList>0000',0001,0070,</ShareList></Response>". When any utility that
attempts to parse this string (this is one example of millions found in my
actual data source) you receive that error. You would need to comment out
the following:

        data = row['E']
        print(data) # The before
        data = data[1:-1]
        data = ""'{}'"".format(data)
        print(data) # Sanity check

to achieve the readout.

Unfortunately, I am wondering if there is a scalable solution to the above
issue (perhaps using some form of escape character or regex?). I have ideas
and have tried many but to no avail. There always seems to be an edge case
that escapes me. Thanks.

On Sat, Jan 9, 2016 at 5:08 PM, Karim <kliateni@gmail.com> wrote:

>
>
> On 09/01/2016 21:54, kbtyo wrote:
>
>> My specs:
>>
>> Python 3.4.3
>> Windows 7
>> IDE is Jupyter Notebooks
>>
>> What I have referenced:
>>
>> 1)
>> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>>
>> 2)
>>
>> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>>
>> 3)
>> http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>>
>>
>> Here is the data (in CSV format) and script, respectively, (I have tried
>> variations on serializing Column 'E' using both Sax and ElementTree):
>>
>> i)
>>
>> A,B,C,D,E,F,G,H,I,J
>> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO
>> /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
>> /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
>> TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
>> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>>
>> ii)
>>
>> #!/usr/bin/python
>> # -*-  coding: utf-8 -*-
>> import os.path
>> import sys
>> import csv
>> from io import StringIO
>> import xml.etree.cElementTree as ElementTree
>> from xml.etree.ElementTree import XMLParser
>> import xml
>> import xml.sax
>> from xml.sax import ContentHandler
>>
>> class MyHandler(xml.sax.handler.ContentHandler):
>>      def __init__(self):
>>          self._charBuffer = []
>>          self._result = []
>>
>>      def _getCharacterData(self):
>>          data = ''.join(self._charBuffer).strip()
>>          self._charBuffer = []
>>          return data.strip() #remove strip() if whitespace is important
>>
>>      def parse(self, f):
>>          xml.sax.parse(f, self)
>>          return self._result
>>
>>      def characters(self, data):
>>          self._charBuffer.append(data)
>>
>>      def startElement(self, name, attrs):
>>          if name == 'Response':
>>              self._result.append({})
>>
>>      def endElement(self, name):
>>          if not name == 'Response': self._result[-1][name] =
>> self._getCharacterData()
>>
>> def read_data(path):
>>      with open(path, 'rU', encoding='utf-8') as data:
>>          reader = csv.DictReader(data, delimiter =',', quotechar="'",
>> skipinitialspace=True)
>>          for row in reader:
>>              yield row
>>
>> if __name__ == "__main__":
>>      empty = ''
>>      Response = 'sample.csv'
>>      for idx, row in enumerate(read_data(Response)):
>>          if idx > 10: break
>>          data = row['E']
>>          print(data) # The before
>>          data = data[1:-1]
>>          data = ""'{}'"".format(data)
>>          print(data) # Sanity check
>> #         data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>>          try:
>>              root = ElementTree.XML(data)
>> #             print(root)
>>          except StopIteration:
>>              raise
>>              pass
>> #         xmlstring = StringIO(data)
>> #         print(xmlstring)
>> #         Handler = MyHandler().parse(xmlstring)
>>
>>
>> Specifically, due to the quoting in the CSV file (which is beyond my
>> control), I have had to resort to slicing the string (line 51) and then
>> formatting it (line 52).
>>
>> However the print out from the above attempt is as follows:
>>
>> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
>> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>>
>>    File "<string>", line unknown
>> ParseError: no element found: line 1, column 69
>> Interestingly - if I assign the variable "data" (as in line 54) I receive
>> this:
>>
>>    File "<ipython-input-80-7357c9272b92>", line 56
>> data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>>                       ^
>> SyntaxError: invalid token
>>
>> I seek feedback and information on how to address utilizing the most
>> Pythonic means to do so. Ideally, is there a method that can leverage
>> ElementTree. Thank you, in advance, for your feedback and guidance.
>>
>
> I don't understand because this line 54 gives:
>
> >>> import xml.etree.cElementTree as ElementTree
> >>> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>   File "<stdin>", line 1
>     data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> ^
> SyntaxError: invalid syntax
>
>
> BUT IF you correct the string and remove the inner quote after 0000
> everything's fine:
> >>> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000,0001,0070,
> </ShareList></Response>'
> >>> root = ElementTree.XML(data)
> >>> root
> <Element 'Response' at 0x7f0fb6dce330>
> Karim
>
>

[toc] | [prev] | [next] | [standalone]


#101419

FromSaran Ahluwalia <ahlusar.ahluwalia@gmail.com>
Date2016-01-09 18:15 -0500
Message-ID<mailman.104.1452381330.2305.python-list@python.org>
In reply to#101412
Thank you for the feedback on this. I believe that the excel dialect
includes just that:

class excel(Dialect):
    delimiter = ','
    quotechar = '"'
    doublequote = True
    skipinitialspace = False
    lineterminator = '\r\n'
    quoting = QUOTE_MINIMAL


On Sat, Jan 9, 2016 at 5:23 PM, Karim <kliateni@gmail.com> wrote:

>
>
> On 09/01/2016 21:54, kbtyo wrote:
>
>> My specs:
>>
>> Python 3.4.3
>> Windows 7
>> IDE is Jupyter Notebooks
>>
>> What I have referenced:
>>
>> 1)
>> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>>
>> 2)
>>
>> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>>
>> 3)
>> http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>>
>>
>> Here is the data (in CSV format) and script, respectively, (I have tried
>> variations on serializing Column 'E' using both Sax and ElementTree):
>>
>> i)
>>
>> A,B,C,D,E,F,G,H,I,J
>> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO
>> /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
>> /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
>> TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
>> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>>
>> ii)
>>
>> #!/usr/bin/python
>> # -*-  coding: utf-8 -*-
>> import os.path
>> import sys
>> import csv
>> from io import StringIO
>> import xml.etree.cElementTree as ElementTree
>> from xml.etree.ElementTree import XMLParser
>> import xml
>> import xml.sax
>> from xml.sax import ContentHandler
>>
>> class MyHandler(xml.sax.handler.ContentHandler):
>>      def __init__(self):
>>          self._charBuffer = []
>>          self._result = []
>>
>>      def _getCharacterData(self):
>>          data = ''.join(self._charBuffer).strip()
>>          self._charBuffer = []
>>          return data.strip() #remove strip() if whitespace is important
>>
>>      def parse(self, f):
>>          xml.sax.parse(f, self)
>>          return self._result
>>
>>      def characters(self, data):
>>          self._charBuffer.append(data)
>>
>>      def startElement(self, name, attrs):
>>          if name == 'Response':
>>              self._result.append({})
>>
>>      def endElement(self, name):
>>          if not name == 'Response': self._result[-1][name] =
>> self._getCharacterData()
>>
>> def read_data(path):
>>      with open(path, 'rU', encoding='utf-8') as data:
>>          reader = csv.DictReader(data, delimiter =',', quotechar="'",
>> skipinitialspace=True)
>>          for row in reader:
>>              yield row
>>
>> if __name__ == "__main__":
>>      empty = ''
>>      Response = 'sample.csv'
>>      for idx, row in enumerate(read_data(Response)):
>>          if idx > 10: break
>>          data = row['E']
>>          print(data) # The before
>>          data = data[1:-1]
>>          data = ""'{}'"".format(data)
>>          print(data) # Sanity check
>> #         data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>>          try:
>>              root = ElementTree.XML(data)
>> #             print(root)
>>          except StopIteration:
>>              raise
>>              pass
>> #         xmlstring = StringIO(data)
>> #         print(xmlstring)
>> #         Handler = MyHandler().parse(xmlstring)
>>
>>
>> Specifically, due to the quoting in the CSV file (which is beyond my
>> control), I have had to resort to slicing the string (line 51) and then
>> formatting it (line 52).
>>
>> However the print out from the above attempt is as follows:
>>
>> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
>> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>>
>>    File "<string>", line unknown
>> ParseError: no element found: line 1, column 69
>> Interestingly - if I assign the variable "data" (as in line 54) I receive
>> this:
>>
>>    File "<ipython-input-80-7357c9272b92>", line 56
>> data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>>                       ^
>> SyntaxError: invalid token
>>
>> I seek feedback and information on how to address utilizing the most
>> Pythonic means to do so. Ideally, is there a method that can leverage
>> ElementTree. Thank you, in advance, for your feedback and guidance.
>>
>
> In  fact to get rid of double quote simply create your csv reader like
> that:
>
> reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)
>
> You should then don't need to slice data variable and reformat it.
>
> Karim
>
>
>

[toc] | [prev] | [next] | [standalone]


#101420

FromKarim <kliateni@gmail.com>
Date2016-01-10 00:30 +0100
Message-ID<mailman.105.1452382257.2305.python-list@python.org>
In reply to#101412
Yes it changes your quotechar = "'" into quotechar = '"'

You should no more get the double quoting of the data string and no more 
slicing step.

Karim

On 10/01/2016 00:15, Saran Ahluwalia wrote:
> Thank you for the feedback on this. I believe that the excel dialect 
> includes just that:
>
> class excel(Dialect):
>      delimiter = ','
>      quotechar = '"'
>      doublequote = True
>      skipinitialspace = False
>      lineterminator = '\r\n'
>      quoting = QUOTE_MINIMAL
>
> On Sat, Jan 9, 2016 at 5:23 PM, Karim <kliateni@gmail.com 
> <mailto:kliateni@gmail.com>> wrote:
>
>
>
>     On 09/01/2016 21:54, kbtyo wrote:
>
>         My specs:
>
>         Python 3.4.3
>         Windows 7
>         IDE is Jupyter Notebooks
>
>         What I have referenced:
>
>         1)
>         http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
>         2)
>         http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
>         3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
>         Here is the data (in CSV format) and script, respectively, (I
>         have tried variations on serializing Column 'E' using both Sax
>         and ElementTree):
>
>         i)
>
>         A,B,C,D,E,F,G,H,I,J
>         "3","8","1","<Request TransactionID="3"
>         RequestType="FOO"><InstitutionISO
>         /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
>         /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
>         TransactionID="2"
>         RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
>         22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
>         ii)
>
>         #!/usr/bin/python
>         # -*-  coding: utf-8 -*-
>         import os.path
>         import sys
>         import csv
>         from io import StringIO
>         import xml.etree.cElementTree as ElementTree
>         from xml.etree.ElementTree import XMLParser
>         import xml
>         import xml.sax
>         from xml.sax import ContentHandler
>
>         class MyHandler(xml.sax.handler.ContentHandler):
>              def __init__(self):
>                  self._charBuffer = []
>                  self._result = []
>
>              def _getCharacterData(self):
>                  data = ''.join(self._charBuffer).strip()
>                  self._charBuffer = []
>                  return data.strip() #remove strip() if whitespace is
>         important
>
>              def parse(self, f):
>                  xml.sax.parse(f, self)
>                  return self._result
>
>              def characters(self, data):
>                  self._charBuffer.append(data)
>
>              def startElement(self, name, attrs):
>                  if name == 'Response':
>                      self._result.append({})
>
>              def endElement(self, name):
>                  if not name == 'Response': self._result[-1][name] =
>         self._getCharacterData()
>
>         def read_data(path):
>              with open(path, 'rU', encoding='utf-8') as data:
>                  reader = csv.DictReader(data, delimiter =',',
>         quotechar="'", skipinitialspace=True)
>                  for row in reader:
>                      yield row
>
>         if __name__ == "__main__":
>              empty = ''
>              Response = 'sample.csv'
>              for idx, row in enumerate(read_data(Response)):
>                  if idx > 10: break
>                  data = row['E']
>                  print(data) # The before
>                  data = data[1:-1]
>                  data = ""'{}'"".format(data)
>                  print(data) # Sanity check
>         #         data = '<Response TransactionID="2"
>         RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>                  try:
>                      root = ElementTree.XML(data)
>         #             print(root)
>                  except StopIteration:
>                      raise
>                      pass
>         #         xmlstring = StringIO(data)
>         #         print(xmlstring)
>         #         Handler = MyHandler().parse(xmlstring)
>
>
>         Specifically, due to the quoting in the CSV file (which is
>         beyond my control), I have had to resort to slicing the string
>         (line 51) and then formatting it (line 52).
>
>         However the print out from the above attempt is as follows:
>
>         "<Response TransactionID="2"
>         RequestType="HoldInquiry"><ShareList>0000'
>         <Response TransactionID="2"
>         RequestType="HoldInquiry"><ShareList>0000
>
>            File "<string>", line unknown
>         ParseError: no element found: line 1, column 69
>         Interestingly - if I assign the variable "data" (as in line
>         54) I receive this:
>
>            File "<ipython-input-80-7357c9272b92>", line 56
>         data = '<Response TransactionID="2"
>         RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>                               ^
>         SyntaxError: invalid token
>
>         I seek feedback and information on how to address utilizing
>         the most Pythonic means to do so. Ideally, is there a method
>         that can leverage ElementTree. Thank you, in advance, for your
>         feedback and guidance.
>
>
>     In  fact to get rid of double quote simply create your csv reader
>     like that:
>
>     reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)
>
>     You should then don't need to slice data variable and reformat it.
>
>     Karim
>
>
>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web