Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #93043 > unrolled thread

To write headers once with different values in separate row in CSV

Started bySahlusar <ahlusar.ahluwalia@gmail.com>
First post2015-06-23 10:15 -0700
Last post2015-06-25 09:37 -0700
Articles 8 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  To write headers once with different values in separate row in CSV Sahlusar <ahlusar.ahluwalia@gmail.com> - 2015-06-23 10:15 -0700
    Re: To write headers once with different values in separate row in CSV John Gordon <gordon@panix.com> - 2015-06-23 19:12 +0000
      Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 04:31 -0700
    Re: To write headers once with different values in separate row in CSV Steven D'Aprano <steve@pearwood.info> - 2015-06-24 11:50 +1000
      Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 04:37 -0700
        Re: To write headers once with different values in separate row in CSV Steven D'Aprano <steve@pearwood.info> - 2015-06-24 22:38 +1000
          Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-24 05:52 -0700
          Re: To write headers once with different values in separate row in CSV kbtyo <ahlusar.ahluwalia@gmail.com> - 2015-06-25 09:37 -0700

#93043 — To write headers once with different values in separate row in CSV

FromSahlusar <ahlusar.ahluwalia@gmail.com>
Date2015-06-23 10:15 -0700
SubjectTo write headers once with different values in separate row in CSV
Message-ID<c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com>
I have the following script for writing out to CSV two items in a list to a CSV in such a format, such that if we have:

L = [('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('D', 8)]

we want

A B C D
1 2 3 4
5 6   8

And with this 

L = [('A', 1), ('B', 2), ('C', 3), ('D', 4),('D', 5), ('B', 6), ('A', 8)]

A B C D
1 2 3 4
8 6   5

I have this script: 

    def makerows(pairs):
        headers = []
        columns = {}
        for k, v in pairs:
            if k in columns:
                columns[k].extend((v,))
            else:
                headers.append(k)
                columns[k] = [k, v]
        m = max(len(c) for c in columns.values())
        for c in columns.values():
            c.extend('' for i in range(len(c), m))
        L = [columns[k] for k in headers]
        rows = list(zip(*L))
        return rows

//With this test//


    if __name__ == '__main__':
        
        lists = [
            [('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('C', 7), ('D', 8)],
            [('A', 1), ('B', 2), ('C', 3), ('D', 4),('A', 5), ('B', 6), ('D', 8)],
            [('A', 1), ('B', 2), ('C', 3), ('D', 4),('D', 5), ('B', 6), ('A', 8)],
        ]
        from pprint import pprint
        for data in lists:
            print(data)
            pprint(makerows(data))


output:
    
    [('A', 1), ('B', 2), ('C', 3), ('D', 4), ('A', 5), ('B', 6), ('C', 7), ('D', 8)]
    [('A', 'B', 'C', 'D'), (1, 2, 3, 4), (5, 6, 7, 8)]
    [('A', 1), ('B', 2), ('C', 3), ('D', 4), ('A', 5), ('B', 6), ('D', 8)]
    [('A', 'B', 'C', 'D'), (1, 2, 3, 4), (5, 6, '', 8)]
    [('A', 1), ('B', 2), ('C', 3), ('D', 4), ('D', 5), ('B', 6), ('A', 8)]  

[('A', 'B', 'C', 'D'), (1, 2, 3, 4), (8, 6, '', 5)]

However, when I extrapolate this same logic with a list like:

('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress.TimeAtPreviousAddress.', None), where the headers/columns are the first item (only to be written out once) with different values. I receive an output CSV with repeating headers and values all printed in one long string (when opened in an application like Excel). 

I use this script:

def makerows(pairs):

        headers = []
        columns = {}
        for k, v in pairs:
            if k in columns:
                columns[k].extend((v,))
            else:
                headers.append(k)
                columns[k] = [k, v]
        m = max(len(c) for c in columns.values())
        for c in columns.values():
            c.extend(' ' for i in range(len(c), m))
        L = [columns[k] for k in headers]
        rows = list(zip(*L))
        return rows
    
    
    def main():
        with open('sample.xml', 'r', encoding='utf-8') as f: 
            xml_string = f.read() 
        xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
        root = ElementTree.XML(xml_string) 
        for item in root:
            print(root)
        writer = csv.writer(open("test_out.csv", 'wt'))
        writer.writerows(makerows(flatten_dict(root)))
        
    if __name__ == "__main__":
            main()


************FYI, flatten_dict() parses XML elements and their text in to key, value pairs and then converts them into a list. That is not the underlying issue. Any thoughts or suggestions would be very helpful. Thank you. 

[toc] | [next] | [standalone]


#93045

FromJohn Gordon <gordon@panix.com>
Date2015-06-23 19:12 +0000
Message-ID<mmcb2s$dcm$1@reader1.panix.com>
In reply to#93043
In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:

> However, when I extrapolate this same logic with a list like:

> ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> .TimeAtPreviousAddress.', None), where the headers/columns are the first
> item (only to be written out once) with different values. I receive an
> output CSV with repeating headers and values all printed in one long string

First, I would try to determine if the problem is in the makerows()
function, or if the problem is elsewhere.

Have you tried creating some dummy data by hand and seeing how makerows()
handles it?

(By the way, if your post had included some sample data that illustrates
the problem, it would have been much easier to figure out a solution.
Instead, we are left guessing at your XML format, and at the particular
implementation of flatten_dict().)

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]


#93072

Fromkbtyo <ahlusar.ahluwalia@gmail.com>
Date2015-06-24 04:31 -0700
Message-ID<3e93574e-a6b5-4ca4-8bba-c37433848f1a@googlegroups.com>
In reply to#93045
On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:
> 
> > However, when I extrapolate this same logic with a list like:
> 
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
> 
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
> 
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
> 
> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)
> 
> -- 
> John Gordon                   A is for Amy, who fell down the stairs
> gordon@panix.com              B is for Basil, assaulted by bears
>                                 -- Edward Gorey, "The Gashlycrumb Tinies"



On Tuesday, June 23, 2015 at 3:12:40 PM UTC-4, John Gordon wrote:
> In <c0ea6bec-b6b1-48fd-9291-0fedcda7b76c@googlegroups.com> Sahlusar <ahlusar.ahluwalia@gmail.com> writes:
> 
> > However, when I extrapolate this same logic with a list like:
> 
> > ('Response.MemberO.PMembers.PMembers.Member.CurrentEmployer.EmployerAddress
> > .TimeAtPreviousAddress.', None), where the headers/columns are the first
> > item (only to be written out once) with different values. I receive an
> > output CSV with repeating headers and values all printed in one long string
> 
> First, I would try to determine if the problem is in the makerows()
> function, or if the problem is elsewhere.
> 
> Have you tried creating some dummy data by hand and seeing how makerows()
> handles it?
>


Yes I did do this.  


> (By the way, if your post had included some sample data that illustrates
> the problem, it would have been much easier to figure out a solution.
> Instead, we are left guessing at your XML format, and at the particular
> implementation of flatten_dict().)

Yes, unfortunately, due to NDA protocols I cannot share this. 
> 
> -- 
> John Gordon                   A is for Amy, who fell down the stairs
> gordon@panix.com              B is for Basil, assaulted by bears
>                                 -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]


#93061

FromSteven D'Aprano <steve@pearwood.info>
Date2015-06-24 11:50 +1000
Message-ID<558a0cef$0$1670$c3e8da3$5496439d@news.astraweb.com>
In reply to#93043
On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:

> That is not the underlying issue. Any thoughts or suggestions would be
> very helpful.


Thank you for spending over 100 lines to tell us what is NOT the underlying
issue. I will therefore tell you what is NOT the solution to your problem
(whatever it is, since I can't tell). The solution is NOT to squeeze lemon
juice into your keyboard.

If someday you feel like telling us what the issue actually IS, instead of
what it IS NOT, then perhaps we will have a chance to help you find a
solution.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#93074

Fromkbtyo <ahlusar.ahluwalia@gmail.com>
Date2015-06-24 04:37 -0700
Message-ID<22c60270-d8e8-43eb-b58b-b6d86dc6732e@googlegroups.com>
In reply to#93061
On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> 
> > That is not the underlying issue. Any thoughts or suggestions would be
> > very helpful.
> 
> 
> Thank you for spending over 100 lines to tell us what is NOT the underlying
> issue. I will therefore tell you what is NOT the solution to your problem
> (whatever it is, since I can't tell). The solution is NOT to squeeze lemon
> juice into your keyboard.
> 
> If someday you feel like telling us what the issue actually IS, instead of
> what it IS NOT, then perhaps we will have a chance to help you find a
> solution.
> 
> 
> 
> -- 
> Steven

Curious - what should I have provided? Detailed and constructive feedback (like your reply to my post regarding importing functions) is more useful than to "squeeze lemon juice" into one's keyboard. 

[toc] | [prev] | [next] | [standalone]


#93077

FromSteven D'Aprano <steve@pearwood.info>
Date2015-06-24 22:38 +1000
Message-ID<558aa4b1$0$1671$c3e8da3$5496439d@news.astraweb.com>
In reply to#93074
On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:

> On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
>> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
>> 
>> > That is not the underlying issue. Any thoughts or suggestions would be
>> > very helpful.
>> 
>> 
>> Thank you for spending over 100 lines to tell us what is NOT the
>> underlying issue. I will therefore tell you what is NOT the solution to
>> your problem (whatever it is, since I can't tell). The solution is NOT to
>> squeeze lemon juice into your keyboard.
>> 
>> If someday you feel like telling us what the issue actually IS, instead
>> of what it IS NOT, then perhaps we will have a chance to help you find a
>> solution.
>> 
>> 
>> 
>> --
>> Steven
> 
> Curious - what should I have provided? 

To start with, you should tell us what is the problem you are having. You
gave us some code, and then said "That is not the underlying issue". Okay,
so what is the underlying issue? What is the problem you want help solving?

In another post, you responded to John Gordon's question:

    # John
    Have you tried creating some dummy data by hand and seeing 
    how makerows() handles it?


by answering:

    Yes I did do this.


Okay. What was the result? Do you want us to guess what result you got?


John also suggested that you provide sample data, and an implementation of
flatten_dict, and your answer is:

    Yes, unfortunately, due to NDA protocols I cannot share this.


You don't have to provide your *actual* data. You can provide *sample* data,
that does not contain any of your actual confidential values. If your XML
file looks like this:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
</catalog>


you can replace the data:

<?xml version="1.0"?>
<catalog>
   <book id="1111">
      <author>Smith, John</author>
      <title>ABCDEF</title>
      <genre>Widgets</genre>
      <price>9999.99</price>
      <publish_date>1900-01-01</publish_date>
      <description>blah blah blah blah</description>
   </book>
</catalog>


You can even change the tags:


<?xml version="1.0"?>
<whatzit>
   <spam id="1111">
      <a>Smith, John</a>
      <b>ABCDEF</b>
      <c>Widgets</c>
      <d>9999.99</d>
      <e>1900-01-01</e>
      <f>blah blah blah blah</f>
   </spam>
</whatzit>


If you're still worried that the sample XML has the same structure as your
real data, you can remove some fields and add new ones:

<?xml version="1.0"?>
<whatzit>
   <spam id="1111">
      <b>ABCDEF</b>
      <d>9999.99</d>
      <e>1900-01-01</e>
      <z>fe fi fo fum</z>
      <f>blah blah blah blah</f>
   </spam>
</whatzit>


If you can't share the flatten_dict() function, either: 

(1) get permission to share it from your manager or project leader.
flatten_dict is not a trade secret or valuable in any way, and
half-competent Python programmer can probably come up with two or three
different ways to flatten a dict in five minutes. They're all going to look
more or less the same, because there's only so many ways to flatten a dict.

(2) Or accept that we can't help you, and deal with it on your own.



> Detailed and constructive feedback 
> (like your reply to my post regarding importing functions) is more useful
> than to "squeeze lemon juice" into one's keyboard.

Of course. That is why I said it was NOT the solution. Don't waste your time
squeezing lemon juice over your keyboard, it won't solve your problem.

But you can't expect us to guess what your problem is, or debug code we
can't see, or read your mind and understand your data.

Before you ask any more questions, please read this:

http://sscce.org/



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#93078

Fromkbtyo <ahlusar.ahluwalia@gmail.com>
Date2015-06-24 05:52 -0700
Message-ID<a0bf011e-dbae-4787-9525-ebad6e42fbf8@googlegroups.com>
In reply to#93077
On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:
> 
> > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> >> 
> >> > That is not the underlying issue. Any thoughts or suggestions would be
> >> > very helpful.
> >> 
> >> 
> >> Thank you for spending over 100 lines to tell us what is NOT the
> >> underlying issue. I will therefore tell you what is NOT the solution to
> >> your problem (whatever it is, since I can't tell). The solution is NOT to
> >> squeeze lemon juice into your keyboard.
> >> 
> >> If someday you feel like telling us what the issue actually IS, instead
> >> of what it IS NOT, then perhaps we will have a chance to help you find a
> >> solution.
> >> 
> >> 
> >> 
> >> --
> >> Steven
> > 
> > Curious - what should I have provided? 
> 
> To start with, you should tell us what is the problem you are having. You
> gave us some code, and then said "That is not the underlying issue". Okay,
> so what is the underlying issue? What is the problem you want help solving?
> 
> In another post, you responded to John Gordon's question:
> 
>     # John
>     Have you tried creating some dummy data by hand and seeing 
>     how makerows() handles it?
> 
> 
> by answering:
> 
>     Yes I did do this.
> 
> 
> Okay. What was the result? Do you want us to guess what result you got?
> 
> 
> John also suggested that you provide sample data, and an implementation of
> flatten_dict, and your answer is:
> 
>     Yes, unfortunately, due to NDA protocols I cannot share this.
> 
> 
> You don't have to provide your *actual* data. You can provide *sample* data,
> that does not contain any of your actual confidential values. If your XML
> file looks like this:
> 
> <?xml version="1.0"?>
> <catalog>
>    <book id="bk101">
>       <author>Gambardella, Matthew</author>
>       <title>XML Developer's Guide</title>
>       <genre>Computer</genre>
>       <price>44.95</price>
>       <publish_date>2000-10-01</publish_date>
>       <description>An in-depth look at creating applications 
>       with XML.</description>
>    </book>
> </catalog>
> 
> 
> you can replace the data:
> 
> <?xml version="1.0"?>
> <catalog>
>    <book id="1111">
>       <author>Smith, John</author>
>       <title>ABCDEF</title>
>       <genre>Widgets</genre>
>       <price>9999.99</price>
>       <publish_date>1900-01-01</publish_date>
>       <description>blah blah blah blah</description>
>    </book>
> </catalog>
> 
> 
> You can even change the tags:
> 
> 
> <?xml version="1.0"?>
> <whatzit>
>    <spam id="1111">
>       <a>Smith, John</a>
>       <b>ABCDEF</b>
>       <c>Widgets</c>
>       <d>9999.99</d>
>       <e>1900-01-01</e>
>       <f>blah blah blah blah</f>
>    </spam>
> </whatzit>
> 
> 
> If you're still worried that the sample XML has the same structure as your
> real data, you can remove some fields and add new ones:
> 
> <?xml version="1.0"?>
> <whatzit>
>    <spam id="1111">
>       <b>ABCDEF</b>
>       <d>9999.99</d>
>       <e>1900-01-01</e>
>       <z>fe fi fo fum</z>
>       <f>blah blah blah blah</f>
>    </spam>
> </whatzit>
> 
> 
> If you can't share the flatten_dict() function, either: 
> 
> (1) get permission to share it from your manager or project leader.
> flatten_dict is not a trade secret or valuable in any way, and
> half-competent Python programmer can probably come up with two or three
> different ways to flatten a dict in five minutes. They're all going to look
> more or less the same, because there's only so many ways to flatten a dict.
> 
> (2) Or accept that we can't help you, and deal with it on your own.
> 
> 
> 
> > Detailed and constructive feedback 
> > (like your reply to my post regarding importing functions) is more useful
> > than to "squeeze lemon juice" into one's keyboard.
> 
> Of course. That is why I said it was NOT the solution. Don't waste your time
> squeezing lemon juice over your keyboard, it won't solve your problem.
> 
> But you can't expect us to guess what your problem is, or debug code we
> can't see, or read your mind and understand your data.
> 
> Before you ask any more questions, please read this:
> 
> http://sscce.org/
> 
> 
> 
> -- 
> Steven



On Wednesday, June 24, 2015 at 8:38:24 AM UTC-4, Steven D'Aprano wrote:
> On Wed, 24 Jun 2015 09:37 pm, kbtyo wrote:
> 
> > On Tuesday, June 23, 2015 at 9:50:50 PM UTC-4, Steven D'Aprano wrote:
> >> On Wed, 24 Jun 2015 03:15 am, Sahlusar wrote:
> >> 
> >> > That is not the underlying issue. Any thoughts or suggestions would be
> >> > very helpful.
> >> 
> >> 
> >> Thank you for spending over 100 lines to tell us what is NOT the
> >> underlying issue. I will therefore tell you what is NOT the solution to
> >> your problem (whatever it is, since I can't tell). The solution is NOT to
> >> squeeze lemon juice into your keyboard.
> >> 
> >> If someday you feel like telling us what the issue actually IS, instead
> >> of what it IS NOT, then perhaps we will have a chance to help you find a
> >> solution.
> >> 
> >> 
> >> 
> >> --
> >> Steven
> > 
> > Curious - what should I have provided? 
> 
> To start with, you should tell us what is the problem you are having. You
> gave us some code, and then said "That is not the underlying issue". Okay,
> so what is the underlying issue? What is the problem you want help solving?
> 
> In another post, you responded to John Gordon's question:
> 
>     # John
>     Have you tried creating some dummy data by hand and seeing 
>     how makerows() handles it?
> 
> 
> by answering:
> 
>     Yes I did do this.
> 
> 
> Okay. What was the result? Do you want us to guess what result you got?
> 
> 
> John also suggested that you provide sample data, and an implementation of
> flatten_dict, and your answer is:
> 
>     Yes, unfortunately, due to NDA protocols I cannot share this.
> 
> 
> You don't have to provide your *actual* data. You can provide *sample* data,
> that does not contain any of your actual confidential values. If your XML
> file looks like this:
> 
> <?xml version="1.0"?>
> <catalog>
>    <book id="bk101">
>       <author>Gambardella, Matthew</author>
>       <title>XML Developer's Guide</title>
>       <genre>Computer</genre>
>       <price>44.95</price>
>       <publish_date>2000-10-01</publish_date>
>       <description>An in-depth look at creating applications 
>       with XML.</description>
>    </book>
> </catalog>
> 
> 
> you can replace the data:
> 
> <?xml version="1.0"?>
> <catalog>
>    <book id="1111">
>       <author>Smith, John</author>
>       <title>ABCDEF</title>
>       <genre>Widgets</genre>
>       <price>9999.99</price>
>       <publish_date>1900-01-01</publish_date>
>       <description>blah blah blah blah</description>
>    </book>
> </catalog>
> 
> 
> You can even change the tags:
> 
> 
> <?xml version="1.0"?>
> <whatzit>
>    <spam id="1111">
>       <a>Smith, John</a>
>       <b>ABCDEF</b>
>       <c>Widgets</c>
>       <d>9999.99</d>
>       <e>1900-01-01</e>
>       <f>blah blah blah blah</f>
>    </spam>
> </whatzit>
> 
> 
> If you're still worried that the sample XML has the same structure as your
> real data, you can remove some fields and add new ones:
> 
> <?xml version="1.0"?>
> <whatzit>
>    <spam id="1111">
>       <b>ABCDEF</b>
>       <d>9999.99</d>
>       <e>1900-01-01</e>
>       <z>fe fi fo fum</z>
>       <f>blah blah blah blah</f>
>    </spam>
> </whatzit>
> 
> 
> If you can't share the flatten_dict() function, either: 
> 
> (1) get permission to share it from your manager or project leader.
> flatten_dict is not a trade secret or valuable in any way, and
> half-competent Python programmer can probably come up with two or three
> different ways to flatten a dict in five minutes. They're all going to look
> more or less the same, because there's only so many ways to flatten a dict.
> 
> (2) Or accept that we can't help you, and deal with it on your own.
> 
> 
> 
> > Detailed and constructive feedback 
> > (like your reply to my post regarding importing functions) is more useful
> > than to "squeeze lemon juice" into one's keyboard.
> 
> Of course. That is why I said it was NOT the solution. Don't waste your time
> squeezing lemon juice over your keyboard, it won't solve your problem.
> 
> But you can't expect us to guess what your problem is, or debug code we
> can't see, or read your mind and understand your data.
> 
> Before you ask any more questions, please read this:
> 
> http://sscce.org/
> 
> 
> 
> -- 
> Steven

Thanks for the feedback. To be quite honest with you, I have just started a new position as a data engineer after switching careers from secondary education. This feedback will help me adapt to this community. Thank you again for your continued support. 

[toc] | [prev] | [next] | [standalone]


#93152

Fromkbtyo <ahlusar.ahluwalia@gmail.com>
Date2015-06-25 09:37 -0700
Message-ID<c51f1026-547b-4322-9469-17f8b88e8973@googlegroups.com>
In reply to#93077
Okay, so I have gone back to the drawing board and have the following predicament (my apologies, in advance for the indentation):

Here is my sample:
    <Response ID="24856-775" RequestType="Moverview">        
            <MonthDayCount>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
            </MonthDayCount>
    			<FeeCount>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    			</FeeCount>
    			<PaymentBucketAmount>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    					</PaymentBucketAmount>
    					<PaymentBucketDueDate>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    					</PaymentBucketDueDate>
            <Warnings />
            <SList />
            <LList />
            <PA>False</PA>
            <PL>False</PL>
            <PC>False</PC>
            <PCs>False</PCs>
            <PJ>False</PJ>
            <OITC>0</OITC>
            <MG />
            <R />
            <CCGoods />
    </Response>


Using this:


    import xml.etree.cElementTree as ElementTree 
    from xml.etree.ElementTree import XMLParser
    import csv
    
    def flatten_list(aList, prefix=''):
        for i, element in enumerate(aList, 1):
            eprefix = "{}{}".format(prefix, i)
            if element:
                # treat like dict 
                if len(element) == 1 or element[0].tag != element[1].tag: 
                    yield from flatten_dict(element, eprefix)
                # treat like list 
                elif element[0].tag == element[1].tag: 
                    yield from flatten_list(element, eprefix)
            elif element.text: 
                text = element.text.strip() 
                if text: 
                    yield eprefix[:].rstrip('.'), element.text
    
    def flatten_dict(parent_element, prefix=''):
        prefix = prefix + parent_element.tag 
        if parent_element.items():
            for k, v in parent_element.items():
                yield prefix + k, v
        for element in parent_element:
            eprefix = prefix + element.tag  
            if element:
                # treat like dict - we assume that if the first two tags 
                # in a series are different, then they are all different. 
                if len(element) == 1 or element[0].tag != element[1].tag: 
                    yield from flatten_dict(element, prefix=prefix)
                # treat like list - we assume that if the first two tags 
                # in a series are the same, then the rest are the same. 
                else: 
                    # here, we put the list in dictionary; the key is the 
                    # tag name the list elements all share in common, and 
                    # the value is the list itself
                    yield from flatten_list(element, prefix=eprefix)
                # if the tag has attributes, add those to the dict
                if element.items():
                    for k, v in element.items():
                        yield eprefix+k
            # this assumes that if you've got an attribute in a tag, 
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are 
            # currently doing XML configuration files... 
            elif element.items(): 
                for k, v in element.items():
                    yield eprefix+k
            # finally, if there are no child tags and no attributes, extract 
            # the text 
            else:
                yield eprefix, element.text                
    
    def makerows(pairs):
        headers = []
        columns = {}
        for k, v in pairs:
            if k in columns:
                columns[k].extend((v,))
            else:
                headers.append(k)
                columns[k] = [k, v]
        m = max(len(c) for c in columns.values())
        for c in columns.values():
            c.extend(' ' for i in range(len(c), m))
        L = [columns[k] for k in headers]
        rows = list(zip(*L))
        return rows                   
                        
    
    def main():
        with open('sample.xml', 'r', encoding='utf-8') as f: 
            xml_string = f.read() 
        xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
        root = ElementTree.XML(xml_string) 
        for key, value in flatten_dict(root):
            key = key.rstrip('.').rsplit('.', 1)[-1]
            print(key,value)            
            
    if __name__ == "__main__":
        main()


I receive this output:

    ResponseRequestType Moverview
    ResponseID 24856-775
    ResponseMonthDayCount1 0
    ResponseMonthDayCount2 0
    ResponseMonthDayCount3 0
    ResponseMonthDayCount4 0
    ResponseMonthDayCount5 0
    ResponseMonthDayCount6 0
    ResponseMonthDayCount7 0
    ResponseMonthDayCount8 0
    ResponseMonthDayCount9 0
    ResponseMonthDayCount10 0
    ResponseMonthDayCount11 0
    ResponseMonthDayCount12 0
    ResponseMonthDayCount13 0
    ResponseMonthDayCount14 0
    ResponseMonthDayCount15 0
    ResponseMonthDayCount16 0
    ResponseMonthDayCount17 0
    ResponseMonthDayCount18 0
    ResponseMonthDayCount19 0
    ResponseMonthDayCount20 0
    ResponseMonthDayCount21 0
    ResponseMonthDayCount22 0
    ResponseMonthDayCount23 0
    ResponseMonthDayCount24 0
    ResponseMonthDayCount25 0
    ResponseFeeCount1 0
    ResponseFeeCount2 0
    ResponseFeeCount3 0
    ResponseFeeCount4 0
    ResponseFeeCount5 0
    ResponseFeeCount6 0
    ResponsePaymentBucketAmount1 0
    ResponsePaymentBucketAmount2 0
    ResponsePaymentBucketAmount3 0
    ResponsePaymentBucketAmount4 0
    ResponsePaymentBucketAmount5 0
    ResponsePaymentBucketAmount6 0
    ResponsePaymentBucketAmount7 0
    ResponsePaymentBucketAmount8 0
    ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
    ResponseWarnings None
    ResponseSList None
    ResponseLList None
    ResponsePA False
    ResponsePL False
    ResponsePC False
    ResponsePCs False
    ResponsePJ False
    ResponseOITC 0
    ResponseMG None
    ResponseR None
    ResponseCCGoods None


When I write it out to the CSV, using :        


    writer = csv.writer(open("try2.csv", 'wt')) 
    writer.writerows(makerows(flatten_dict(root)))`


I still receive the headers with the **Response** chained to the sub-elements of the root with the tags' text as the values (which is just fine). My goal is to only have the sublements as the headers (along with their values), sans the Response (unless it is the tag name). Thanks you all for your feedback! For example, ResponsePaymentBucketAmount1 0, should just be PaymentBucketAmount1 as the header.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web