Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93043 > unrolled thread
| Started by | Sahlusar <ahlusar.ahluwalia@gmail.com> |
|---|---|
| First post | 2015-06-23 10:15 -0700 |
| Last post | 2015-06-25 09:37 -0700 |
| Articles | 8 — 4 participants |
Back to article view | Back to comp.lang.python
To write headers once with different values in separate row in CSV Sahlusar <ahlusar.ahluwalia@gmail.com> - 2015-06-23 10:15 -0700
Re: To write headers once with different values in separate row in CSV John Gordon <gordon@panix.com> - 2015-06-23 19:12 +0000
Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 04:31 -0700
Re: To write headers once with different values in separate row in CSV Steven D'Aprano <steve@pearwood.info> - 2015-06-24 11:50 +1000
Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 04:37 -0700
Re: To write headers once with different values in separate row in CSV Steven D'Aprano <steve@pearwood.info> - 2015-06-24 22:38 +1000
Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 05:52 -0700
Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-25 09:37 -0700
| From | Sahlusar <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-06-23 10:15 -0700 |
| Subject | To write headers once with different values in separate row in CSV |
| Message-ID | <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> |
I have the following script for writing out to CSV two items in a list to a CSV in such a format, such that if we have:
L = [('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('D', 8)]
we want
A B C D
1 2 3 4
5 6 8
And with this
L = [('A', 1), ('B', 2), ('C', 3), ('D', 4),('D', 5), ('B', 6), ('A', 8)]
A B C D
1 2 3 4
8 6 5
I have this script:
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend('' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
//With this test//
if __name__ == '__main__':
lists = [
[('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('C', 7), ('D', 8)],
[('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('D', 8)],
[('A', 1), ('B', 2), ('C', 3), ('D', 4),('D', 5), ('B', 6), ('A', 8)],
]
from pprint import pprint
for data in lists:
print(data)
pprint(makerows(data))
output:
[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('A', 5), ('B', 6), ('C', 7), ('D', 8)]
[('A', 'B', 'C', 'D'), (1, 2, 3, 4), (5, 6, 7, 8)]
[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('A', 5), ('B', 6), ('D', 8)]
[('A', 'B', 'C', 'D'), (1, 2, 3, 4), (5, 6, '', 8)]
[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('D', 5), ('B', 6), ('A', 8)]
[('A', 'B', 'C', 'D'), (1, 2, 3, 4), (8, 6, '', 5)]
However, when I extrapolate this same logic with a list like:
('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress.TimeAtPreviousAddress.', None), where the headers/columns are the first item (only to be written out once) with different values. I receive an output CSV with repeating headers and values all printed in one long string (when opened in an application like Excel).
I use this script:
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('sample.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
for item in root:
print(root)
writer = csv.writer(open("test_out.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))
if __name__ == "__main__":
main()
************FYI, flatten_dict() parses XML elements and their text in to key, value pairs and then converts them into a list. That is not the underlying issue. Any thoughts or suggestions would be very helpful. Thank you.
[toc] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2015-06-23 19:12 +0000 |
| Message-ID | <mmcb2s$dcm$1@reader1.panix.com> |
| In reply to | #93043 |
In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:
> However, when I extrapolate this same logic with a list like:
> ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> .TimeAtPreviousAddress.', None), where the headers/columns are the first
> item (only to be written out once) with different values. I receive an
> output CSV with repeating headers and values all printed in one long string
First, I would try to determine if the problem is in the makerows()
function, or if the problem is elsewhere.
Have you tried creating some dummy data by hand and seeing how makerows()
handles it?
(By the way, if your post had included some sample data that illustrates
the problem, it would have been much easier to figure out a solution.
Instead, we are left guessing at your XML format, and at the particular
implementation of flatten_dict().)
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-06-24 04:31 -0700 |
| Message-ID | <3e93574e-a6b5-4ca4-8bba-c37433848f1a@googlegroups.com> |
| In reply to | #93045 |
On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:
>
> > However, when I extrapolate this same logic with a list like:
>
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
>
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
>
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
>
> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)
>
> --
> John Gordon A is for Amy, who fell down the stairs
> gordon@panix.com B is for Basil, assaulted by bears
> -- Edward Gorey, "The Gashlycrumb Tinies"
On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:
>
> > However, when I extrapolate this same logic with a list like:
>
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
>
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
>
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
>
Yes I did do this.
> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)
Yes, unfortunately, due to NDA protocols I cannot share this.
>
> --
> John Gordon A is for Amy, who fell down the stairs
> gordon@panix.com B is for Basil, assaulted by bears
> -- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-24 11:50 +1000 |
| Message-ID | <558a0cef$0$1670$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #93043 |
On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote: > That is not the underlying issue. Any thoughts or suggestions would be > very helpful. Thank you for spending over 100 lines to tell us what is NOT the underlying issue. I will therefore tell you what is NOT the solution to your problem (whatever it is, since I can't tell). The solution is NOT to squeeze lemon juice into your keyboard. If someday you feel like telling us what the issue actually IS, instead of what it IS NOT, then perhaps we will have a chance to help you find a solution. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-06-24 04:37 -0700 |
| Message-ID | <22c60270-d8e8-43eb-b58b-b6d86dc6732e@googlegroups.com> |
| In reply to | #93061 |
On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote: > On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote: > > > That is not the underlying issue. Any thoughts or suggestions would be > > very helpful. > > > Thank you for spending over 100 lines to tell us what is NOT the underlying > issue. I will therefore tell you what is NOT the solution to your problem > (whatever it is, since I can't tell). The solution is NOT to squeeze lemon > juice into your keyboard. > > If someday you feel like telling us what the issue actually IS, instead of > what it IS NOT, then perhaps we will have a chance to help you find a > solution. > > > > -- > Steven Curious - what should I have provided? Detailed and constructive feedback (like your reply to my post regarding importing functions) is more useful than to "squeeze lemon juice" into one's keyboard.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-24 22:38 +1000 |
| Message-ID | <558aa4b1$0$1671$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #93074 |
On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:
> On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
>> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
>>
>> > That is not the underlying issue. Any thoughts or suggestions would be
>> > very helpful.
>>
>>
>> Thank you for spending over 100 lines to tell us what is NOT the
>> underlying issue. I will therefore tell you what is NOT the solution to
>> your problem (whatever it is, since I can't tell). The solution is NOT to
>> squeeze lemon juice into your keyboard.
>>
>> If someday you feel like telling us what the issue actually IS, instead
>> of what it IS NOT, then perhaps we will have a chance to help you find a
>> solution.
>>
>>
>>
>> --
>> Steven
>
> Curious - what should I have provided?
To start with, you should tell us what is the problem you are having. You
gave us some code, and then said "That is not the underlying issue". Okay,
so what is the underlying issue? What is the problem you want help solving?
In another post, you responded to John Gordon's question:
# John
Have you tried creating some dummy data by hand and seeing
how makerows() handles it?
by answering:
Yes I did do this.
Okay. What was the result? Do you want us to guess what result you got?
John also suggested that you provide sample data, and an implementation of
flatten_dict, and your answer is:
Yes, unfortunately, due to NDA protocols I cannot share this.
You don't have to provide your *actual* data. You can provide *sample* data,
that does not contain any of your actual confidential values. If your XML
file looks like this:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
</catalog>
you can replace the data:
<?xml version="1.0"?>
<catalog>
<book id="1111">
<author>Smith, John</author>
<title>ABCDEF</title>
<genre>Widgets</genre>
<price>9999.99</price>
<publish_date>1900-01-01</publish_date>
<description>blah blah blah blah</description>
</book>
</catalog>
You can even change the tags:
<?xml version="1.0"?>
<whatzit>
<spam id="1111">
<a>Smith, John</a>
<b>ABCDEF</b>
<c>Widgets</c>
<d>9999.99</d>
<e>1900-01-01</e>
<f>blah blah blah blah</f>
</spam>
</whatzit>
If you're still worried that the sample XML has the same structure as your
real data, you can remove some fields and add new ones:
<?xml version="1.0"?>
<whatzit>
<spam id="1111">
<b>ABCDEF</b>
<d>9999.99</d>
<e>1900-01-01</e>
<z>fe fi fo fum</z>
<f>blah blah blah blah</f>
</spam>
</whatzit>
If you can't share the flatten_dict() function, either:
(1) get permission to share it from your manager or project leader.
flatten_dict is not a trade secret or valuable in any way, and
half-competent Python programmer can probably come up with two or three
different ways to flatten a dict in five minutes. They're all going to look
more or less the same, because there's only so many ways to flatten a dict.
(2) Or accept that we can't help you, and deal with it on your own.
> Detailed and constructive feedback
> (like your reply to my post regarding importing functions) is more useful
> than to "squeeze lemon juice" into one's keyboard.
Of course. That is why I said it was NOT the solution. Don't waste your time
squeezing lemon juice over your keyboard, it won't solve your problem.
But you can't expect us to guess what your problem is, or debug code we
can't see, or read your mind and understand your data.
Before you ask any more questions, please read this:
http://sscce.org/
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-06-24 05:52 -0700 |
| Message-ID | <a0bf011e-dbae-4787-9525-ebad6e42fbf8@googlegroups.com> |
| In reply to | #93077 |
On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote: > On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote: > > > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote: > >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote: > >> > >> > That is not the underlying issue. Any thoughts or suggestions would be > >> > very helpful. > >> > >> > >> Thank you for spending over 100 lines to tell us what is NOT the > >> underlying issue. I will therefore tell you what is NOT the solution to > >> your problem (whatever it is, since I can't tell). The solution is NOT to > >> squeeze lemon juice into your keyboard. > >> > >> If someday you feel like telling us what the issue actually IS, instead > >> of what it IS NOT, then perhaps we will have a chance to help you find a > >> solution. > >> > >> > >> > >> -- > >> Steven > > > > Curious - what should I have provided? > > To start with, you should tell us what is the problem you are having. You > gave us some code, and then said "That is not the underlying issue". Okay, > so what is the underlying issue? What is the problem you want help solving? > > In another post, you responded to John Gordon's question: > > # John > Have you tried creating some dummy data by hand and seeing > how makerows() handles it? > > > by answering: > > Yes I did do this. > > > Okay. What was the result? Do you want us to guess what result you got? > > > John also suggested that you provide sample data, and an implementation of > flatten_dict, and your answer is: > > Yes, unfortunately, due to NDA protocols I cannot share this. > > > You don't have to provide your *actual* data. You can provide *sample* data, > that does not contain any of your actual confidential values. If your XML > file looks like this: > > <?xml version="1.0"?> > <catalog> > <book id="bk101"> > <author>Gambardella, Matthew</author> > <title>XML Developer's Guide</title> > <genre>Computer</genre> > <price>44.95</price> > <publish_date>2000-10-01</publish_date> > <description>An in-depth look at creating applications > with XML.</description> > </book> > </catalog> > > > you can replace the data: > > <?xml version="1.0"?> > <catalog> > <book id="1111"> > <author>Smith, John</author> > <title>ABCDEF</title> > <genre>Widgets</genre> > <price>9999.99</price> > <publish_date>1900-01-01</publish_date> > <description>blah blah blah blah</description> > </book> > </catalog> > > > You can even change the tags: > > > <?xml version="1.0"?> > <whatzit> > <spam id="1111"> > <a>Smith, John</a> > <b>ABCDEF</b> > <c>Widgets</c> > <d>9999.99</d> > <e>1900-01-01</e> > <f>blah blah blah blah</f> > </spam> > </whatzit> > > > If you're still worried that the sample XML has the same structure as your > real data, you can remove some fields and add new ones: > > <?xml version="1.0"?> > <whatzit> > <spam id="1111"> > <b>ABCDEF</b> > <d>9999.99</d> > <e>1900-01-01</e> > <z>fe fi fo fum</z> > <f>blah blah blah blah</f> > </spam> > </whatzit> > > > If you can't share the flatten_dict() function, either: > > (1) get permission to share it from your manager or project leader. > flatten_dict is not a trade secret or valuable in any way, and > half-competent Python programmer can probably come up with two or three > different ways to flatten a dict in five minutes. They're all going to look > more or less the same, because there's only so many ways to flatten a dict. > > (2) Or accept that we can't help you, and deal with it on your own. > > > > > Detailed and constructive feedback > > (like your reply to my post regarding importing functions) is more useful > > than to "squeeze lemon juice" into one's keyboard. > > Of course. That is why I said it was NOT the solution. Don't waste your time > squeezing lemon juice over your keyboard, it won't solve your problem. > > But you can't expect us to guess what your problem is, or debug code we > can't see, or read your mind and understand your data. > > Before you ask any more questions, please read this: > > http://sscce.org/ > > > > -- > Steven On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote: > On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote: > > > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote: > >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote: > >> > >> > That is not the underlying issue. Any thoughts or suggestions would be > >> > very helpful. > >> > >> > >> Thank you for spending over 100 lines to tell us what is NOT the > >> underlying issue. I will therefore tell you what is NOT the solution to > >> your problem (whatever it is, since I can't tell). The solution is NOT to > >> squeeze lemon juice into your keyboard. > >> > >> If someday you feel like telling us what the issue actually IS, instead > >> of what it IS NOT, then perhaps we will have a chance to help you find a > >> solution. > >> > >> > >> > >> -- > >> Steven > > > > Curious - what should I have provided? > > To start with, you should tell us what is the problem you are having. You > gave us some code, and then said "That is not the underlying issue". Okay, > so what is the underlying issue? What is the problem you want help solving? > > In another post, you responded to John Gordon's question: > > # John > Have you tried creating some dummy data by hand and seeing > how makerows() handles it? > > > by answering: > > Yes I did do this. > > > Okay. What was the result? Do you want us to guess what result you got? > > > John also suggested that you provide sample data, and an implementation of > flatten_dict, and your answer is: > > Yes, unfortunately, due to NDA protocols I cannot share this. > > > You don't have to provide your *actual* data. You can provide *sample* data, > that does not contain any of your actual confidential values. If your XML > file looks like this: > > <?xml version="1.0"?> > <catalog> > <book id="bk101"> > <author>Gambardella, Matthew</author> > <title>XML Developer's Guide</title> > <genre>Computer</genre> > <price>44.95</price> > <publish_date>2000-10-01</publish_date> > <description>An in-depth look at creating applications > with XML.</description> > </book> > </catalog> > > > you can replace the data: > > <?xml version="1.0"?> > <catalog> > <book id="1111"> > <author>Smith, John</author> > <title>ABCDEF</title> > <genre>Widgets</genre> > <price>9999.99</price> > <publish_date>1900-01-01</publish_date> > <description>blah blah blah blah</description> > </book> > </catalog> > > > You can even change the tags: > > > <?xml version="1.0"?> > <whatzit> > <spam id="1111"> > <a>Smith, John</a> > <b>ABCDEF</b> > <c>Widgets</c> > <d>9999.99</d> > <e>1900-01-01</e> > <f>blah blah blah blah</f> > </spam> > </whatzit> > > > If you're still worried that the sample XML has the same structure as your > real data, you can remove some fields and add new ones: > > <?xml version="1.0"?> > <whatzit> > <spam id="1111"> > <b>ABCDEF</b> > <d>9999.99</d> > <e>1900-01-01</e> > <z>fe fi fo fum</z> > <f>blah blah blah blah</f> > </spam> > </whatzit> > > > If you can't share the flatten_dict() function, either: > > (1) get permission to share it from your manager or project leader. > flatten_dict is not a trade secret or valuable in any way, and > half-competent Python programmer can probably come up with two or three > different ways to flatten a dict in five minutes. They're all going to look > more or less the same, because there's only so many ways to flatten a dict. > > (2) Or accept that we can't help you, and deal with it on your own. > > > > > Detailed and constructive feedback > > (like your reply to my post regarding importing functions) is more useful > > than to "squeeze lemon juice" into one's keyboard. > > Of course. That is why I said it was NOT the solution. Don't waste your time > squeezing lemon juice over your keyboard, it won't solve your problem. > > But you can't expect us to guess what your problem is, or debug code we > can't see, or read your mind and understand your data. > > Before you ask any more questions, please read this: > > http://sscce.org/ > > > > -- > Steven Thanks for the feedback. To be quite honest with you, I have just started a new position as a data engineer after switching careers from secondary education. This feedback will help me adapt to this community. Thank you again for your continued support.
[toc] | [prev] | [next] | [standalone]
| From | kbtyo <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-06-25 09:37 -0700 |
| Message-ID | <c51f1026-547b-4322-9469-17f8b88e8973@googlegroups.com> |
| In reply to | #93077 |
Okay, so I have gone back to the drawing board and have the following predicament (my apologies, in advance for the indentation):
Here is my sample:
<Response ID="24856-775" RequestType="Moverview">
<MonthDayCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</MonthDayCount>
<FeeCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</FeeCount>
<PaymentBucketAmount>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
</PaymentBucketAmount>
<PaymentBucketDueDate>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
</PaymentBucketDueDate>
<Warnings />
<SList />
<LList />
<PA>False</PA>
<PL>False</PL>
<PC>False</PC>
<PCs>False</PCs>
<PJ>False</PJ>
<OITC>0</OITC>
<MG />
<R />
<CCGoods />
</Response>
Using this:
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import csv
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield from flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = prefix + element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('sample.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
for key, value in flatten_dict(root):
key = key.rstrip('.').rsplit('.', 1)[-1]
print(key,value)
if __name__ == "__main__":
main()
I receive this output:
ResponseRequestType Moverview
ResponseID 24856-775
ResponseMonthDayCount1 0
ResponseMonthDayCount2 0
ResponseMonthDayCount3 0
ResponseMonthDayCount4 0
ResponseMonthDayCount5 0
ResponseMonthDayCount6 0
ResponseMonthDayCount7 0
ResponseMonthDayCount8 0
ResponseMonthDayCount9 0
ResponseMonthDayCount10 0
ResponseMonthDayCount11 0
ResponseMonthDayCount12 0
ResponseMonthDayCount13 0
ResponseMonthDayCount14 0
ResponseMonthDayCount15 0
ResponseMonthDayCount16 0
ResponseMonthDayCount17 0
ResponseMonthDayCount18 0
ResponseMonthDayCount19 0
ResponseMonthDayCount20 0
ResponseMonthDayCount21 0
ResponseMonthDayCount22 0
ResponseMonthDayCount23 0
ResponseMonthDayCount24 0
ResponseMonthDayCount25 0
ResponseFeeCount1 0
ResponseFeeCount2 0
ResponseFeeCount3 0
ResponseFeeCount4 0
ResponseFeeCount5 0
ResponseFeeCount6 0
ResponsePaymentBucketAmount1 0
ResponsePaymentBucketAmount2 0
ResponsePaymentBucketAmount3 0
ResponsePaymentBucketAmount4 0
ResponsePaymentBucketAmount5 0
ResponsePaymentBucketAmount6 0
ResponsePaymentBucketAmount7 0
ResponsePaymentBucketAmount8 0
ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
ResponseWarnings None
ResponseSList None
ResponseLList None
ResponsePA False
ResponsePL False
ResponsePC False
ResponsePCs False
ResponsePJ False
ResponseOITC 0
ResponseMG None
ResponseR None
ResponseCCGoods None
When I write it out to the CSV, using :
writer = csv.writer(open("try2.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))`
I still receive the headers with the **Response** chained to the sub-elements of the root with the tags' text as the values (which is just fine). My goal is to only have the sublements as the headers (along with their values), sans the Response (unless it is the tag name). Thanks you all for your feedback! For example, ResponsePaymentBucketAmount1 0, should just be PaymentBucketAmount1 as the header.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web