Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101412 > unrolled thread
| Started by | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| First post | 2016-01-09 12:54 -0800 |
| Last post | 2016-01-10 00:30 +0100 |
| Articles | 6 — 3 participants |
Back to article view | Back to comp.lang.python
Understanding how to quote XML string in order to serialize using Python's ElementTree kbtyo <ahlusar.ahluwalia@gmail.com> - 2016-01-09 12:54 -0800
Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-09 23:08 +0100
Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-09 23:23 +0100
Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2016-01-09 18:13 -0500
Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2016-01-09 18:15 -0500
Re: Understanding how to quote XML string in order to serialize using Python's ElementTree Karim <kliateni@gmail.com> - 2016-01-10 00:30 +0100
| From | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2016-01-09 12:54 -0800 |
| Subject | Understanding how to quote XML string in order to serialize using Python's ElementTree |
| Message-ID | <d0a2acdb-857c-47c5-a28d-422a8fc4cc74@googlegroups.com> |
My specs:
Python 3.4.3
Windows 7
IDE is Jupyter Notebooks
What I have referenced:
1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
2)
http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):
i)
A,B,C,D,E,F,G,H,I,J
"3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
ii)
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os.path
import sys
import csv
from io import StringIO
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import xml
import xml.sax
from xml.sax import ContentHandler
class MyHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self._charBuffer = []
self._result = []
def _getCharacterData(self):
data = ''.join(self._charBuffer).strip()
self._charBuffer = []
return data.strip() #remove strip() if whitespace is important
def parse(self, f):
xml.sax.parse(f, self)
return self._result
def characters(self, data):
self._charBuffer.append(data)
def startElement(self, name, attrs):
if name == 'Response':
self._result.append({})
def endElement(self, name):
if not name == 'Response': self._result[-1][name] = self._getCharacterData()
def read_data(path):
with open(path, 'rU', encoding='utf-8') as data:
reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
for row in reader:
yield row
if __name__ == "__main__":
empty = ''
Response = 'sample.csv'
for idx, row in enumerate(read_data(Response)):
if idx > 10: break
data = row['E']
print(data) # The before
data = data[1:-1]
data = ""'{}'"".format(data)
print(data) # Sanity check
# data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
try:
root = ElementTree.XML(data)
# print(root)
except StopIteration:
raise
pass
# xmlstring = StringIO(data)
# print(xmlstring)
# Handler = MyHandler().parse(xmlstring)
Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).
However the print out from the above attempt is as follows:
"<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
File "<string>", line unknown
ParseError: no element found: line 1, column 69
Interestingly - if I assign the variable "data" (as in line 54) I receive this:
File "<ipython-input-80-7357c9272b92>", line 56
data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
^
SyntaxError: invalid token
I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.
[toc] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2016-01-09 23:08 +0100 |
| Message-ID | <mailman.100.1452377311.2305.python-list@python.org> |
| In reply to | #101412 |
On 09/01/2016 21:54, kbtyo wrote:
> My specs:
>
> Python 3.4.3
> Windows 7
> IDE is Jupyter Notebooks
>
> What I have referenced:
>
> 1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
> 2)
> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
> 3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
> Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):
>
> i)
>
> A,B,C,D,E,F,G,H,I,J
> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
> ii)
>
> #!/usr/bin/python
> # -*- coding: utf-8 -*-
> import os.path
> import sys
> import csv
> from io import StringIO
> import xml.etree.cElementTree as ElementTree
> from xml.etree.ElementTree import XMLParser
> import xml
> import xml.sax
> from xml.sax import ContentHandler
>
> class MyHandler(xml.sax.handler.ContentHandler):
> def __init__(self):
> self._charBuffer = []
> self._result = []
>
> def _getCharacterData(self):
> data = ''.join(self._charBuffer).strip()
> self._charBuffer = []
> return data.strip() #remove strip() if whitespace is important
>
> def parse(self, f):
> xml.sax.parse(f, self)
> return self._result
>
>
> def characters(self, data):
> self._charBuffer.append(data)
>
> def startElement(self, name, attrs):
> if name == 'Response':
> self._result.append({})
>
> def endElement(self, name):
> if not name == 'Response': self._result[-1][name] = self._getCharacterData()
>
> def read_data(path):
> with open(path, 'rU', encoding='utf-8') as data:
> reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
> for row in reader:
> yield row
>
> if __name__ == "__main__":
> empty = ''
> Response = 'sample.csv'
> for idx, row in enumerate(read_data(Response)):
> if idx > 10: break
> data = row['E']
> print(data) # The before
> data = data[1:-1]
> data = ""'{}'"".format(data)
> print(data) # Sanity check
> # data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> try:
> root = ElementTree.XML(data)
> # print(root)
> except StopIteration:
> raise
> pass
> # xmlstring = StringIO(data)
> # print(xmlstring)
> # Handler = MyHandler().parse(xmlstring)
>
>
> Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).
>
> However the print out from the above attempt is as follows:
>
> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>
> File "<string>", line unknown
> ParseError: no element found: line 1, column 69
> Interestingly - if I assign the variable "data" (as in line 54) I receive this:
>
> File "<ipython-input-80-7357c9272b92>", line 56
> data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> ^
> SyntaxError: invalid token
>
> I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.
I don't understand because this line 54 gives:
>>> import xml.etree.cElementTree as ElementTree
>>> data = '<Response TransactionID="2"
RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
File "<stdin>", line 1
data = '<Response TransactionID="2"
RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
^
SyntaxError: invalid syntax
BUT IF you correct the string and remove the inner quote after 0000
everything's fine:
>>> data = '<Response TransactionID="2"
RequestType="HoldInquiry"><ShareList>0000,0001,0070,
</ShareList></Response>'
>>> root = ElementTree.XML(data)
>>> root
<Element 'Response' at 0x7f0fb6dce330>
Karim
[toc] | [prev] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2016-01-09 23:23 +0100 |
| Message-ID | <mailman.101.1452378242.2305.python-list@python.org> |
| In reply to | #101412 |
On 09/01/2016 21:54, kbtyo wrote:
> My specs:
>
> Python 3.4.3
> Windows 7
> IDE is Jupyter Notebooks
>
> What I have referenced:
>
> 1) http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
> 2)
> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
> 3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
> Here is the data (in CSV format) and script, respectively, (I have tried variations on serializing Column 'E' using both Sax and ElementTree):
>
> i)
>
> A,B,C,D,E,F,G,H,I,J
> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
> ii)
>
> #!/usr/bin/python
> # -*- coding: utf-8 -*-
> import os.path
> import sys
> import csv
> from io import StringIO
> import xml.etree.cElementTree as ElementTree
> from xml.etree.ElementTree import XMLParser
> import xml
> import xml.sax
> from xml.sax import ContentHandler
>
> class MyHandler(xml.sax.handler.ContentHandler):
> def __init__(self):
> self._charBuffer = []
> self._result = []
>
> def _getCharacterData(self):
> data = ''.join(self._charBuffer).strip()
> self._charBuffer = []
> return data.strip() #remove strip() if whitespace is important
>
> def parse(self, f):
> xml.sax.parse(f, self)
> return self._result
>
>
> def characters(self, data):
> self._charBuffer.append(data)
>
> def startElement(self, name, attrs):
> if name == 'Response':
> self._result.append({})
>
> def endElement(self, name):
> if not name == 'Response': self._result[-1][name] = self._getCharacterData()
>
> def read_data(path):
> with open(path, 'rU', encoding='utf-8') as data:
> reader = csv.DictReader(data, delimiter =',', quotechar="'", skipinitialspace=True)
> for row in reader:
> yield row
>
> if __name__ == "__main__":
> empty = ''
> Response = 'sample.csv'
> for idx, row in enumerate(read_data(Response)):
> if idx > 10: break
> data = row['E']
> print(data) # The before
> data = data[1:-1]
> data = ""'{}'"".format(data)
> print(data) # Sanity check
> # data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> try:
> root = ElementTree.XML(data)
> # print(root)
> except StopIteration:
> raise
> pass
> # xmlstring = StringIO(data)
> # print(xmlstring)
> # Handler = MyHandler().parse(xmlstring)
>
>
> Specifically, due to the quoting in the CSV file (which is beyond my control), I have had to resort to slicing the string (line 51) and then formatting it (line 52).
>
> However the print out from the above attempt is as follows:
>
> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>
> File "<string>", line unknown
> ParseError: no element found: line 1, column 69
> Interestingly - if I assign the variable "data" (as in line 54) I receive this:
>
> File "<ipython-input-80-7357c9272b92>", line 56
> data = '<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> ^
> SyntaxError: invalid token
>
> I seek feedback and information on how to address utilizing the most Pythonic means to do so. Ideally, is there a method that can leverage ElementTree. Thank you, in advance, for your feedback and guidance.
In fact to get rid of double quote simply create your csv reader like that:
reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)
You should then don't need to slice data variable and reformat it.
Karim
[toc] | [prev] | [next] | [standalone]
| From | Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2016-01-09 18:13 -0500 |
| Message-ID | <mailman.103.1452381230.2305.python-list@python.org> |
| In reply to | #101412 |
As mentioned previously, I assigned the variable *data *to the string
''<Response
TransactionID="2" RequestType="HoldInquiry"><Sha
reList>0000',0001,0070,</ShareList></Response>". When any utility that
attempts to parse this string (this is one example of millions found in my
actual data source) you receive that error. You would need to comment out
the following:
data = row['E']
print(data) # The before
data = data[1:-1]
data = ""'{}'"".format(data)
print(data) # Sanity check
to achieve the readout.
Unfortunately, I am wondering if there is a scalable solution to the above
issue (perhaps using some form of escape character or regex?). I have ideas
and have tried many but to no avail. There always seems to be an edge case
that escapes me. Thanks.
On Sat, Jan 9, 2016 at 5:08 PM, Karim <kliateni@gmail.com> wrote:
>
>
> On 09/01/2016 21:54, kbtyo wrote:
>
>> My specs:
>>
>> Python 3.4.3
>> Windows 7
>> IDE is Jupyter Notebooks
>>
>> What I have referenced:
>>
>> 1)
>> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>>
>> 2)
>>
>> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>>
>> 3)
>> http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>>
>>
>> Here is the data (in CSV format) and script, respectively, (I have tried
>> variations on serializing Column 'E' using both Sax and ElementTree):
>>
>> i)
>>
>> A,B,C,D,E,F,G,H,I,J
>> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO
>> /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
>> /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
>> TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
>> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>>
>> ii)
>>
>> #!/usr/bin/python
>> # -*- coding: utf-8 -*-
>> import os.path
>> import sys
>> import csv
>> from io import StringIO
>> import xml.etree.cElementTree as ElementTree
>> from xml.etree.ElementTree import XMLParser
>> import xml
>> import xml.sax
>> from xml.sax import ContentHandler
>>
>> class MyHandler(xml.sax.handler.ContentHandler):
>> def __init__(self):
>> self._charBuffer = []
>> self._result = []
>>
>> def _getCharacterData(self):
>> data = ''.join(self._charBuffer).strip()
>> self._charBuffer = []
>> return data.strip() #remove strip() if whitespace is important
>>
>> def parse(self, f):
>> xml.sax.parse(f, self)
>> return self._result
>>
>> def characters(self, data):
>> self._charBuffer.append(data)
>>
>> def startElement(self, name, attrs):
>> if name == 'Response':
>> self._result.append({})
>>
>> def endElement(self, name):
>> if not name == 'Response': self._result[-1][name] =
>> self._getCharacterData()
>>
>> def read_data(path):
>> with open(path, 'rU', encoding='utf-8') as data:
>> reader = csv.DictReader(data, delimiter =',', quotechar="'",
>> skipinitialspace=True)
>> for row in reader:
>> yield row
>>
>> if __name__ == "__main__":
>> empty = ''
>> Response = 'sample.csv'
>> for idx, row in enumerate(read_data(Response)):
>> if idx > 10: break
>> data = row['E']
>> print(data) # The before
>> data = data[1:-1]
>> data = ""'{}'"".format(data)
>> print(data) # Sanity check
>> # data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>> try:
>> root = ElementTree.XML(data)
>> # print(root)
>> except StopIteration:
>> raise
>> pass
>> # xmlstring = StringIO(data)
>> # print(xmlstring)
>> # Handler = MyHandler().parse(xmlstring)
>>
>>
>> Specifically, due to the quoting in the CSV file (which is beyond my
>> control), I have had to resort to slicing the string (line 51) and then
>> formatting it (line 52).
>>
>> However the print out from the above attempt is as follows:
>>
>> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
>> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>>
>> File "<string>", line unknown
>> ParseError: no element found: line 1, column 69
>> Interestingly - if I assign the variable "data" (as in line 54) I receive
>> this:
>>
>> File "<ipython-input-80-7357c9272b92>", line 56
>> data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>> ^
>> SyntaxError: invalid token
>>
>> I seek feedback and information on how to address utilizing the most
>> Pythonic means to do so. Ideally, is there a method that can leverage
>> ElementTree. Thank you, in advance, for your feedback and guidance.
>>
>
> I don't understand because this line 54 gives:
>
> >>> import xml.etree.cElementTree as ElementTree
> >>> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> File "<stdin>", line 1
> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> ^
> SyntaxError: invalid syntax
>
>
> BUT IF you correct the string and remove the inner quote after 0000
> everything's fine:
> >>> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000,0001,0070,
> </ShareList></Response>'
> >>> root = ElementTree.XML(data)
> >>> root
> <Element 'Response' at 0x7f0fb6dce330>
> Karim
>
>
[toc] | [prev] | [next] | [standalone]
| From | Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2016-01-09 18:15 -0500 |
| Message-ID | <mailman.104.1452381330.2305.python-list@python.org> |
| In reply to | #101412 |
Thank you for the feedback on this. I believe that the excel dialect
includes just that:
class excel(Dialect):
delimiter = ','
quotechar = '"'
doublequote = True
skipinitialspace = False
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
On Sat, Jan 9, 2016 at 5:23 PM, Karim <kliateni@gmail.com> wrote:
>
>
> On 09/01/2016 21:54, kbtyo wrote:
>
>> My specs:
>>
>> Python 3.4.3
>> Windows 7
>> IDE is Jupyter Notebooks
>>
>> What I have referenced:
>>
>> 1)
>> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>>
>> 2)
>>
>> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>>
>> 3)
>> http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>>
>>
>> Here is the data (in CSV format) and script, respectively, (I have tried
>> variations on serializing Column 'E' using both Sax and ElementTree):
>>
>> i)
>>
>> A,B,C,D,E,F,G,H,I,J
>> "3","8","1","<Request TransactionID="3" RequestType="FOO"><InstitutionISO
>> /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
>> /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
>> TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
>> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>>
>> ii)
>>
>> #!/usr/bin/python
>> # -*- coding: utf-8 -*-
>> import os.path
>> import sys
>> import csv
>> from io import StringIO
>> import xml.etree.cElementTree as ElementTree
>> from xml.etree.ElementTree import XMLParser
>> import xml
>> import xml.sax
>> from xml.sax import ContentHandler
>>
>> class MyHandler(xml.sax.handler.ContentHandler):
>> def __init__(self):
>> self._charBuffer = []
>> self._result = []
>>
>> def _getCharacterData(self):
>> data = ''.join(self._charBuffer).strip()
>> self._charBuffer = []
>> return data.strip() #remove strip() if whitespace is important
>>
>> def parse(self, f):
>> xml.sax.parse(f, self)
>> return self._result
>>
>> def characters(self, data):
>> self._charBuffer.append(data)
>>
>> def startElement(self, name, attrs):
>> if name == 'Response':
>> self._result.append({})
>>
>> def endElement(self, name):
>> if not name == 'Response': self._result[-1][name] =
>> self._getCharacterData()
>>
>> def read_data(path):
>> with open(path, 'rU', encoding='utf-8') as data:
>> reader = csv.DictReader(data, delimiter =',', quotechar="'",
>> skipinitialspace=True)
>> for row in reader:
>> yield row
>>
>> if __name__ == "__main__":
>> empty = ''
>> Response = 'sample.csv'
>> for idx, row in enumerate(read_data(Response)):
>> if idx > 10: break
>> data = row['E']
>> print(data) # The before
>> data = data[1:-1]
>> data = ""'{}'"".format(data)
>> print(data) # Sanity check
>> # data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>> try:
>> root = ElementTree.XML(data)
>> # print(root)
>> except StopIteration:
>> raise
>> pass
>> # xmlstring = StringIO(data)
>> # print(xmlstring)
>> # Handler = MyHandler().parse(xmlstring)
>>
>>
>> Specifically, due to the quoting in the CSV file (which is beyond my
>> control), I have had to resort to slicing the string (line 51) and then
>> formatting it (line 52).
>>
>> However the print out from the above attempt is as follows:
>>
>> "<Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000'
>> <Response TransactionID="2" RequestType="HoldInquiry"><ShareList>0000
>>
>> File "<string>", line unknown
>> ParseError: no element found: line 1, column 69
>> Interestingly - if I assign the variable "data" (as in line 54) I receive
>> this:
>>
>> File "<ipython-input-80-7357c9272b92>", line 56
>> data = '<Response TransactionID="2"
>> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
>> ^
>> SyntaxError: invalid token
>>
>> I seek feedback and information on how to address utilizing the most
>> Pythonic means to do so. Ideally, is there a method that can leverage
>> ElementTree. Thank you, in advance, for your feedback and guidance.
>>
>
> In fact to get rid of double quote simply create your csv reader like
> that:
>
> reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)
>
> You should then don't need to slice data variable and reformat it.
>
> Karim
>
>
>
[toc] | [prev] | [next] | [standalone]
| From | Karim <kliateni@gmail.com> |
|---|---|
| Date | 2016-01-10 00:30 +0100 |
| Message-ID | <mailman.105.1452382257.2305.python-list@python.org> |
| In reply to | #101412 |
Yes it changes your quotechar = "'" into quotechar = '"'
You should no more get the double quoting of the data string and no more
slicing step.
Karim
On 10/01/2016 00:15, Saran Ahluwalia wrote:
> Thank you for the feedback on this. I believe that the excel dialect
> includes just that:
>
> class excel(Dialect):
> delimiter = ','
> quotechar = '"'
> doublequote = True
> skipinitialspace = False
> lineterminator = '\r\n'
> quoting = QUOTE_MINIMAL
>
> On Sat, Jan 9, 2016 at 5:23 PM, Karim <kliateni@gmail.com
> <mailto:kliateni@gmail.com>> wrote:
>
>
>
> On 09/01/2016 21:54, kbtyo wrote:
>
> My specs:
>
> Python 3.4.3
> Windows 7
> IDE is Jupyter Notebooks
>
> What I have referenced:
>
> 1)
> http://stackoverflow.com/questions/1546717/python-escaping-strings-for-use-in-xml
>
> 2)
> http://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes
>
> 3)http://stackoverflow.com/questions/4972210/escaping-characters-in-a-xml-file-with-python
>
>
> Here is the data (in CSV format) and script, respectively, (I
> have tried variations on serializing Column 'E' using both Sax
> and ElementTree):
>
> i)
>
> A,B,C,D,E,F,G,H,I,J
> "3","8","1","<Request TransactionID="3"
> RequestType="FOO"><InstitutionISO
> /><CallID>23</CallID><MemberID>12</MemberID><MemberPassword
> /><RequestData><AccountNumber>2</AccountNumber><AccountSuffix>85</AccountSuffix><AccountType>S</AccountType><MPIAcctType>Checking</MPIAcctType><TransactionCount>10</TransactionCount></RequestData></Request>","<Response
> TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>","1967-12-25
> 22:18:13.471000","2005-12-25 22:18:13.768000","2","70","0"
>
> ii)
>
> #!/usr/bin/python
> # -*- coding: utf-8 -*-
> import os.path
> import sys
> import csv
> from io import StringIO
> import xml.etree.cElementTree as ElementTree
> from xml.etree.ElementTree import XMLParser
> import xml
> import xml.sax
> from xml.sax import ContentHandler
>
> class MyHandler(xml.sax.handler.ContentHandler):
> def __init__(self):
> self._charBuffer = []
> self._result = []
>
> def _getCharacterData(self):
> data = ''.join(self._charBuffer).strip()
> self._charBuffer = []
> return data.strip() #remove strip() if whitespace is
> important
>
> def parse(self, f):
> xml.sax.parse(f, self)
> return self._result
>
> def characters(self, data):
> self._charBuffer.append(data)
>
> def startElement(self, name, attrs):
> if name == 'Response':
> self._result.append({})
>
> def endElement(self, name):
> if not name == 'Response': self._result[-1][name] =
> self._getCharacterData()
>
> def read_data(path):
> with open(path, 'rU', encoding='utf-8') as data:
> reader = csv.DictReader(data, delimiter =',',
> quotechar="'", skipinitialspace=True)
> for row in reader:
> yield row
>
> if __name__ == "__main__":
> empty = ''
> Response = 'sample.csv'
> for idx, row in enumerate(read_data(Response)):
> if idx > 10: break
> data = row['E']
> print(data) # The before
> data = data[1:-1]
> data = ""'{}'"".format(data)
> print(data) # Sanity check
> # data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> try:
> root = ElementTree.XML(data)
> # print(root)
> except StopIteration:
> raise
> pass
> # xmlstring = StringIO(data)
> # print(xmlstring)
> # Handler = MyHandler().parse(xmlstring)
>
>
> Specifically, due to the quoting in the CSV file (which is
> beyond my control), I have had to resort to slicing the string
> (line 51) and then formatting it (line 52).
>
> However the print out from the above attempt is as follows:
>
> "<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000'
> <Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000
>
> File "<string>", line unknown
> ParseError: no element found: line 1, column 69
> Interestingly - if I assign the variable "data" (as in line
> 54) I receive this:
>
> File "<ipython-input-80-7357c9272b92>", line 56
> data = '<Response TransactionID="2"
> RequestType="HoldInquiry"><ShareList>0000',0001,0070,</ShareList></Response>'
> ^
> SyntaxError: invalid token
>
> I seek feedback and information on how to address utilizing
> the most Pythonic means to do so. Ideally, is there a method
> that can leverage ElementTree. Thank you, in advance, for your
> feedback and guidance.
>
>
> In fact to get rid of double quote simply create your csv reader
> like that:
>
> reader = csv.DictReader(data, dialect='excel', skipinitialspace=True)
>
> You should then don't need to slice data variable and reformat it.
>
> Karim
>
>
>
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web