Groups > comp.lang.python > #107781 > unrolled thread

Controlling the passing of data

Started by	Sayth Renshaw <flebber.crue@gmail.com>
First post	2016-04-28 05:02 -0700
Last post	2016-04-28 07:52 -0700
Articles	10 — 4 participants

Back to article view | Back to comp.lang.python

  Controlling the passing of data Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-28 05:02 -0700
    Re: Controlling the passing of data Peter Otten <__peter__@web.de> - 2016-04-28 15:40 +0200
      Re: Controlling the passing of data Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-28 06:59 -0700
        RE: Controlling the passing of data Dan Strohl <D.Strohl@F5.com> - 2016-04-28 15:19 +0000
          Re: Controlling the passing of data Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-28 18:49 -0700
        Re: Controlling the passing of data Peter Otten <__peter__@web.de> - 2016-04-29 13:26 +0200
          Re: Controlling the passing of data Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-29 06:17 -0700
        Re: Controlling the passing of data Peter Otten <__peter__@web.de> - 2016-04-29 13:36 +0200
    RE: Controlling the passing of data Dan Strohl <D.Strohl@F5.com> - 2016-04-28 14:40 +0000
      Re: Controlling the passing of data Rustom Mody <rustompmody@gmail.com> - 2016-04-28 07:52 -0700

#107781 — Controlling the passing of data

From	Sayth Renshaw <flebber.crue@gmail.com>
Date	2016-04-28 05:02 -0700
Subject	Controlling the passing of data
Message-ID	<afa1ab1d-6d6a-421c-9351-c8805cc5fbea@googlegroups.com>

Hi

This file contains my biggest roadblock with programming and that's the abstract nature of needing to pass data from one thing to the next.

In my file here I needed to traverse and modify the XML file I don't want to restore it or put it in a new variable or other format I just want to alter it and let it flow onto the list comprehensions as they were.

Once I can get on top of this mentally I will be able to do so much better, I think I am trying to manage it in my head as if it was water and plumbing. 

In particular here I am taking the id from race and putting it into the children of each race called nomination. 

I have put a comment above the new code which is causing the difficulty.

from pyquery import PyQuery as pq
import pandas as pd
import argparse
import numpy as np

# from glob import glob


parser = argparse.ArgumentParser(description=None)


def GetArgs(parser):
    """Parser function using argparse"""
    # parser.add_argument('directory', help='directory use',
    #                     action='store', nargs='*')
    parser.add_argument("files", nargs="+")
    return parser.parse_args()

fileList = GetArgs(parser)
# print(fileList.files)


data = []


horseattrs = ('race_id', 'id', 'horse', 'number', 'finished', 'age', 'sex',
              'blinkers', 'trainernumber', 'career', 'thistrack', 'firstup',
              'secondup', 'variedweight', 'weight', 'pricestarting')
meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')
raceattrs = ('id', 'number', 'shortname', 'stage', 'distance',
             'grade', 'age', 'weightcondition', 'fastesttime', 'sectionaltime')
clubattrs = ('code')

frames = pd.DataFrame([])
noms = []
for items in fileList.files:
    d = pq(filename=items)
    meet = d('meeting')
    club = d('club')
    race = d('race')
    res = d('nomination')

    # d('p').filter(lambda i: i == 1)

    # Here i need to traverse and modify but I don't want to restore the
    # structure just pass it on. So I can use it in the following list
    # comprehensions as I had before.

    for race_el in d('race'):
        race = pq(race_el)
        race_id = race.attr('id')

    for nom_el in race.items('nomination'):
        res.append((pq(nom_el).attr('raceid', race_id)))

    resdata = [[res.eq(i).attr(x)
                for x in horseattrs] for i in range(len(res))]
    # print(dataSets)

    meetdata = [[meet.eq(i).attr(x)
                 for x in meetattrs] for i in range(len(meet))]
    racedata = [[race.eq(i).attr(x)
                 for x in raceattrs] for i in range(len(race))]
    clubdata = [[club.eq(i).attr(x)
                 for x in clubattrs] for i in range(len(club))]
    raceid = [row[0] for row in racedata]

# L = [x + [0] for x in L]
# print(resdata)
# resdata = [raceid[i] for i in raceid  x + i for x in resdata]

# for number of classes equalling nomination in the each category of
# race inset raceid into resdata
#
# print(resdata)
# clubdf = pd.DataFrame(clubdata)
# meetdf = pd.DataFrame(meetdata)
# racedf = pd.DataFrame(racedata)
# resdf = pd.DataFrame(resdata)
# frames = frames.append(clubdf)
# frames = frames.append(meetdf)
#
# frames = frames.append(racedf)
# frames = frames.append(resdf)

# print(frames)
# frames.to_csv('~/testingFrame5.csv', encoding='utf-8')

Thanks

Sayth

[toc] | [next] | [standalone]

#107790

From	Peter Otten <__peter__@web.de>
Date	2016-04-28 15:40 +0200
Message-ID	<mailman.189.1461850832.32212.python-list@python.org>
In reply to	#107781

Sayth Renshaw wrote:

> In my file here I needed to traverse and modify the XML file I don't want
> to restore it or put it in a new variable or other format I just want to
> alter it and let it flow onto the list comprehensions as they were.

That looks like an arbitrary limitation to me. It's a bit like
"I want to repair my car with this big hammer".

> In particular here I am taking the id from race and putting it into the
> children of each race called nomination.
> 
> I have put a comment above the new code which is causing the difficulty.

Your actual problem is drowned in too much source code. Can you restate it 
in English, optionally with a few small snippets of Python? 

It is not even clear what the code you provide should accomplish once it's 
running as desired.

To give at least one code-related advice: You have a few repetitions of the 
following structure

> meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')

>     meet = d('meeting')

>     meetdata = [[meet.eq(i).attr(x)
>                  for x in meetattrs] for i in range(len(meet))]

You should move the pieces into a function that works for meetings, clubs, 
races, and so on. Finally (If I am repeating myself so be it): the occurence 
of range(len(something)) in your code is a strong indication that you are 
not using Python the way Guido intended it. Iterate over the `something` 
directly whenever possible.

[toc] | [prev] | [next] | [standalone]

#107791

From	Sayth Renshaw <flebber.crue@gmail.com>
Date	2016-04-28 06:59 -0700
Message-ID	<cf68e844-441c-4498-a730-0c7b08da1ee9@googlegroups.com>
In reply to	#107790

> 
> Your actual problem is drowned in too much source code. Can you restate it 
> in English, optionally with a few small snippets of Python? 
> 
> It is not even clear what the code you provide should accomplish once it's 
> running as desired.
> 
> To give at least one code-related advice: You have a few repetitions of the 
> following structure
> 
> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')
> 
> >     meet = d('meeting')
> 
> >     meetdata = [[meet.eq(i).attr(x)
> >                  for x in meetattrs] for i in range(len(meet))]
> 
> You should move the pieces into a function that works for meetings, clubs, 
> races, and so on. Finally (If I am repeating myself so be it): the occurence 
> of range(len(something)) in your code is a strong indication that you are 
> not using Python the way Guido intended it. Iterate over the `something` 
> directly whenever possible.

Hi Peter

> meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')

is created to define a list of attr in the XML rather than referencing each attr individually I create a list and pass it into 

 >     meetdata = [[meet.eq(i).attr(x)
> >                  for x in meetattrs] for i in range(len(meet))]

This list comprehension reads the XML attr by attr using meet = d('meeting') as the call to pyquery to locate the class in the XML and identify the attr's.

I do apologise for the lack of output, I asked a question about parsing that I always seem to get wrong over think and then find the solution simpler than I thought.

The output is 4 tables of the class and selected attributes eg meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition') from the meeting class of the XML.

In order to give flexibility and keep the relational nature they have defined in the table I found when exporting the nominations section via pandas to csv that i had no way to determine which id belonged to each race that is there was no race_id in the nominations to relate back to the race, and also no meeting id in the raceid to relate it back to the meeting.


So I wanted to traverse all the classes Meeting, Race and Nomination and insert the id of the class into its direct children only and since there were many races a meeting and many nomnations a race I need to ensure that it is the direct children only.

It was otherwise working as parsed output in code supplied using to push to pandas and use its csv write capability.

So I inserted

    for race_el in d('race'):
        race = pq(race_el)
        race_id = race.attr('id')

    for nom_el in race.items('nomination'):
        res.append((pq(nom_el).attr('raceid', race_id)))

which traverses and inserts the race_id into the child nominations. However, my boggles is how to pass this to the list comprehension that was working without changing the data from XML or creating another intermediate step and variable. Just to parse it as it was but with the extra included race_id.


Thanks

Sayth

[toc] | [prev] | [next] | [standalone]

#107799

From	Dan Strohl <D.Strohl@F5.com>
Date	2016-04-28 15:19 +0000
Message-ID	<mailman.194.1461856753.32212.python-list@python.org>
In reply to	#107791

If I am reading this correctly... you have something like (you will have to excuse my lack of knowledge about what kinds of information these actually are):

<race id=1>
    <nomination>1234</nomination>
    <meeting>first</meeting>
</race>
<race id=2>
    <nomination>5678</nomination>
    <meeting>second</meeting>
</race>


And you want something like:
    nominations = [(1,1234), (2,5678)]
    meetings = [(1,'first'),(2,'second')]

if that is correct, my suggestion is to do something like (this is psudeo code, I didn't look up the exact calls to use):

nomination_list = []
meeting_list = []

for race_element in xml_file('race'):
    id = race_element.get_attr('id')
    for nomination_element in race_element('nomination'):
    	nomination = nomination_element.get_text()
               nomination_list.append((id, nomination))

    for meeting_element in race_element('meeting'):
    	meeting = meeting_element.get_text()
               meeting_list.append((id, meeting))




> -----Original Message-----
> From: Python-list [mailto:python-list-bounces+d.strohl=f5.com@python.org]
> On Behalf Of Sayth Renshaw
> Sent: Thursday, April 28, 2016 7:00 AM
> To: python-list@python.org
> Subject: Re: Controlling the passing of data
> 
> 
> >
> > Your actual problem is drowned in too much source code. Can you
> > restate it in English, optionally with a few small snippets of Python?
> >
> > It is not even clear what the code you provide should accomplish once
> > it's running as desired.
> >
> > To give at least one code-related advice: You have a few repetitions
> > of the following structure
> >
> > > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
> > > 'trackcondition')
> >
> > >     meet = d('meeting')
> >
> > >     meetdata = [[meet.eq(i).attr(x)
> > >                  for x in meetattrs] for i in range(len(meet))]
> >
> > You should move the pieces into a function that works for meetings,
> > clubs, races, and so on. Finally (If I am repeating myself so be it):
> > the occurence of range(len(something)) in your code is a strong
> > indication that you are not using Python the way Guido intended it.
> > Iterate over the `something` directly whenever possible.
> 
> Hi Peter
> 
> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
> > 'trackcondition')
> 
> is created to define a list of attr in the XML rather than referencing each attr
> individually I create a list and pass it into
> 
>  >     meetdata = [[meet.eq(i).attr(x)
> > >                  for x in meetattrs] for i in range(len(meet))]
> 
> This list comprehension reads the XML attr by attr using meet = d('meeting')
> as the call to pyquery to locate the class in the XML and identify the attr's.
> 
> I do apologise for the lack of output, I asked a question about parsing that I
> always seem to get wrong over think and then find the solution simpler than
> I thought.
> 
> The output is 4 tables of the class and selected attributes eg meetattrs = ('id',
> 'venue', 'date', 'rail', 'weather', 'trackcondition') from the meeting class of the
> XML.
> 
> In order to give flexibility and keep the relational nature they have defined in
> the table I found when exporting the nominations section via pandas to csv
> that i had no way to determine which id belonged to each race that is there
> was no race_id in the nominations to relate back to the race, and also no
> meeting id in the raceid to relate it back to the meeting.
> 
> 
> So I wanted to traverse all the classes Meeting, Race and Nomination and
> insert the id of the class into its direct children only and since there were
> many races a meeting and many nomnations a race I need to ensure that it is
> the direct children only.
> 
> It was otherwise working as parsed output in code supplied using to push to
> pandas and use its csv write capability.
> 
> So I inserted
> 
>     for race_el in d('race'):
>         race = pq(race_el)
>         race_id = race.attr('id')
> 
>     for nom_el in race.items('nomination'):
>         res.append((pq(nom_el).attr('raceid', race_id)))
> 
> which traverses and inserts the race_id into the child nominations. However,
> my boggles is how to pass this to the list comprehension that was working
> without changing the data from XML or creating another intermediate step
> and variable. Just to parse it as it was but with the extra included race_id.
> 
> 
> Thanks
> 
> Sayth
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]

#107837

From	Sayth Renshaw <flebber.crue@gmail.com>
Date	2016-04-28 18:49 -0700
Message-ID	<892ed2df-b5ee-48ba-9cdc-4c05a8f097ab@googlegroups.com>
In reply to	#107799

On Friday, 29 April 2016 01:19:28 UTC+10, Dan Strohl  wrote:
> If I am reading this correctly... you have something like (you will have to excuse my lack of knowledge about what kinds of information these actually are):
> 
> <race id=1>
>     <nomination>1234</nomination>
>     <meeting>first</meeting>
> </race>
> <race id=2>
>     <nomination>5678</nomination>
>     <meeting>second</meeting>
> </race>
> 
> 
> And you want something like:
>     nominations = [(1,1234), (2,5678)]
>     meetings = [(1,'first'),(2,'second')]
> 
> if that is correct, my suggestion is to do something like (this is psudeo code, I didn't look up the exact calls to use):
> 
> nomination_list = []
> meeting_list = []
> 
> for race_element in xml_file('race'):
>     id = race_element.get_attr('id')
>     for nomination_element in race_element('nomination'):
>     	nomination = nomination_element.get_text()
>                nomination_list.append((id, nomination))
> 
>     for meeting_element in race_element('meeting'):
>     	meeting = meeting_element.get_text()
>                meeting_list.append((id, meeting))
> 
> 
> 
> 

Yes in essence that is what I am trying to acheive however the XML I have has many attributes like this.

for example this is one nomination.

<nomination number="1" saddlecloth="1" horse="Astern" id="198247" idnumber="" regnumber="" blinkers="0" trainernumber="235" trainersurname="O'Shea" trainerfirstname="John" trainertrack="Agnes Banks/Hawkesbury" rsbtrainername="John O'Shea" jockeynumber="86876" jockeysurname="McDonald" jockeyfirstname="James" barrier="10" weight="56.5" rating="0" description="B C 2 Medaglia D'oro(USA) x Essaouira (Exceed And Excel)" colours="Royal Blue" owners="Godolphin" dob="2013-09-24T00:00:00" age="3" sex="C" career="3-2-0-0 $220750.00" thistrack="1-1-0-0 $68750.00" thisdistance="1-1-0-0 $152000.00" goodtrack="3-2-0-0 $220750.00" heavytrack="0-0-0-0" slowtrack="" deadtrack="" fasttrack="0-0-0-0" firstup="2-2-0-0 $220750.00" secondup="1-0-0-0" mindistancewin="0" maxdistancewin="0" finished="1" weightvariation="0" variedweight="56.5" decimalmargin="0.00" penalty="0" pricestarting="$2.15F" sectional200="0" sectional400="0" sectional600="0" sectional800="0" sectional1200="0" bonusindicator="" />

Therefore I thought that if I tried to do it like the code you posted it would soon become unweildy.

> for race_element in xml_file('race'):
>     id = race_element.get_attr('id')
>     for nomination_element in race_element('nomination'):
>     	nomination = nomination_element.get_text()
>                nomination_list.append((id, nomination))

So I created a list of the attributes of each class meeting race nomination and then parsed that list through the list comprehension.

On putting out the code though I realised that whilst each class worked I had no way to relate the race to the meeting, the nomination to the race so if I then loaded the csv or created sql to push it to a db it would loose its relation.

So when I say
meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')

In my thinking this is a table.
Meeting
id
venue
date
rail
weather
trackcondition

There is no foreign key relation to race, so in this question I am saying shouldn't I put the meeting_id as a foreign key into the race attributes before parsing race and then I can have a 'id' in meeting related to the new 'race_id' in race. The id of race would then be put in nomnation before parsing and I would do the same?

Hoping this is clearer, probably a little close to the problem to express it clearly so I apologise for that.

Sayth

[toc] | [prev] | [next] | [standalone]

#107850

From	Peter Otten <__peter__@web.de>
Date	2016-04-29 13:26 +0200
Message-ID	<mailman.218.1461929203.32212.python-list@python.org>
In reply to	#107791

Sayth Renshaw wrote:

> 
>> 
>> Your actual problem is drowned in too much source code. Can you restate
>> it in English, optionally with a few small snippets of Python?
>> 
>> It is not even clear what the code you provide should accomplish once
>> it's running as desired.
>> 
>> To give at least one code-related advice: You have a few repetitions of
>> the following structure
>> 
>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
>> 
>> You should move the pieces into a function that works for meetings,
>> clubs, races, and so on. Finally (If I am repeating myself so be it): the
>> occurence of range(len(something)) in your code is a strong indication
>> that you are not using Python the way Guido intended it. Iterate over the
>> `something` directly whenever possible.
> 
> Hi Peter
> 
>> meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')
> 
> is created to define a list of attr in the XML rather than referencing
> each attr individually I create a list and pass it into
> 
>  >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
> 
> This list comprehension reads the XML attr by attr using meet =
> d('meeting') as the call to pyquery to locate the class in the XML and
> identify the attr's.

You misunderstood me. I do understand what your code does, I just have no 
idea what you want to do, in terms of the domain, like e. g.

"Print horses with the last three races they took part in."

Why does this matter? Here's an extreme example:

bars = []
for foo in whatever:
   bars.append(foo.baz)

What does this do? The description

"It puts all baz attributes of the items in whatever into a list"

doesn't help. If you say "I want to make a list of all brands in the car 
park I could recommend a change to

brand = set(car.brand for car in car_park)

because a set avoids duplicates. If you say "I want to document my 
achievements for posterity" I would recommend that you print to a file 
rather than append to a list and the original code could be changed to

with open("somefile") as f:
    for achievement in my_achievements:
        print(achievement.description, file=f)


Back to my coding hint: Don't repeat yourself. If you move the pieces

>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]

into a function

def extract_attrs(nodes, attrs):
    return [[nodes.eq(i).attr(name) for name in attrs]
            for i in range(len(nodes))]

You can reuse it for clubs, races, etc.:

meetdata = extract_attrs(d("meeting"), meetattrs)
racedata = extract_attrs(d("race"), raceattrs)

If you put the parts into a dict you can generalize even further:

tables = {
   "meeting": ([], meetattrs),
   "race": ([], raceattrs),
}
for name, (data, attrs) in tables.items():
    data.extend(extract_attrs(d(name), attrs))

> 
> I do apologise for the lack of output, I asked a question about parsing
> that I always seem to get wrong over think and then find the solution
> simpler than I thought.
> 
> The output is 4 tables of the class and selected attributes eg meetattrs =
> ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition') from the
> meeting class of the XML.
> 
> In order to give flexibility and keep the relational nature they have
> defined in the table I found when exporting the nominations section via
> pandas to csv that i had no way to determine which id belonged to each
> race that is there was no race_id in the nominations to relate back to the
> race, and also no meeting id in the raceid to relate it back to the
> meeting.
> So I wanted to traverse all the classes Meeting, Race and Nomination and
> insert the id of the class into its direct children only and since there
> were many races a meeting and many nomnations a race I need to ensure that
> it is the direct children only.
> 
> It was otherwise working as parsed output in code supplied using to push
> to pandas and use its csv write capability.
> 
> So I inserted
> 
>     for race_el in d('race'):
>         race = pq(race_el)
>         race_id = race.attr('id')
> 
>     for nom_el in race.items('nomination'):
>         res.append((pq(nom_el).attr('raceid', race_id)))
> 
> which traverses and inserts the race_id into the child nominations.
> However, my boggles is how to pass this to the list comprehension that was
> working without changing the data from XML or creating another
> intermediate step and variable. Just to parse it as it was but with the
> extra included race_id.

So you want to go from a tree structure to a set of tables that preserves 
the structure by adding foreign keys. You could try a slightly different 
approach, something like

for meeting in meetings:
    meeting_table.append(...meeting attrs...)
    meeting_id = ...
    for race in meeting:
        race_table.append(meeting_id, ...meeting attrs...)
        race_id = ...
        for nomination in race:
            nomination_table.append(race_id, ...nomination attrs...)

I don't know how to spell this in PyQuery -- with lxml you could do 
something like

meeting_table = []
race_table = []
nomination_table = []
tree = lxml.etree.parse(filename)
for meeting in tree.xpath("/meeting"):
    meeting_table.append([meeting.attrib[name] for name in meetattrs])
    meeting_id = meeting.attrib["id"]
    for race in meeting.xpath("./race"):
        race_table.append(
            [meeting_id] + [race.attrib[name] for name in raceattrs])
        race_id = race.attrib["id"]
        for nomination in race.xpath("./nomination"):
            nomination_table.append(
                [race_id]
                + [nomination.attrib[name] for name in horseattrs])

Not as clean and not as general as I would hope -- basically I'm neglecting 
my recommendation from above -- but if it works for you I might take a 
second look later.

[toc] | [prev] | [next] | [standalone]

#107855

From	Sayth Renshaw <flebber.crue@gmail.com>
Date	2016-04-29 06:17 -0700
Message-ID	<03178a33-04cb-491e-8cfb-9e6545fa092d@googlegroups.com>
In reply to	#107850

> because a set avoids duplicates. If you say "I want to document my 
> achievements for posterity" I would recommend that you print to a file 
> rather than append to a list and the original code could be changed to
> 
> with open("somefile") as f:
>     for achievement in my_achievements:
>         print(achievement.description, file=f)
> 
> 
> Back to my coding hint: Don't repeat yourself. If you move the pieces
> 
> >> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
> >> > 'trackcondition')
> >> 
> >> >     meet = d('meeting')
> >> 
> >> >     meetdata = [[meet.eq(i).attr(x)
> >> >                  for x in meetattrs] for i in range(len(meet))]
> 
> into a function
> 
> def extract_attrs(nodes, attrs):
>     return [[nodes.eq(i).attr(name) for name in attrs]
>             for i in range(len(nodes))]
> 
> You can reuse it for clubs, races, etc.:
> 
> meetdata = extract_attrs(d("meeting"), meetattrs)
> racedata = extract_attrs(d("race"), raceattrs)
> 
> If you put the parts into a dict you can generalize even further:
> 
> tables = {
>    "meeting": ([], meetattrs),
>    "race": ([], raceattrs),
> }
> for name, (data, attrs) in tables.items():
>     data.extend(extract_attrs(d(name), attrs))
> 

I find that really cool. Reads well to, hadn't considered approaching it that way at all.

> So you want to go from a tree structure to a set of tables that preserves 
> the structure by adding foreign keys. You could try a slightly different 
> approach, something like
> 
> for meeting in meetings:
>     meeting_table.append(...meeting attrs...)
>     meeting_id = ...
>     for race in meeting:
>         race_table.append(meeting_id, ...meeting attrs...)
>         race_id = ...
>         for nomination in race:
>             nomination_table.append(race_id, ...nomination attrs...)
> 
> I don't know how to spell this in PyQuery -- with lxml you could do 
> something like
> 
> meeting_table = []
> race_table = []
> nomination_table = []
> tree = lxml.etree.parse(filename)
> for meeting in tree.xpath("/meeting"):
>     meeting_table.append([meeting.attrib[name] for name in meetattrs])
>     meeting_id = meeting.attrib["id"]
>     for race in meeting.xpath("./race"):
>         race_table.append(
>             [meeting_id] + [race.attrib[name] for name in raceattrs])
>         race_id = race.attrib["id"]
>         for nomination in race.xpath("./nomination"):
>             nomination_table.append(
>                 [race_id]
>                 + [nomination.attrib[name] for name in horseattrs])
> 
> Not as clean and not as general as I would hope -- basically I'm neglecting 
> my recommendation from above -- but if it works for you I might take a 
> second look later.

I need to play around with this just to understand it more, really like it. Might try and implement your advice from before and put it in a function.

Sayth

[toc] | [prev] | [next] | [standalone]

#107851

From	Peter Otten <__peter__@web.de>
Date	2016-04-29 13:36 +0200
Message-ID	<mailman.219.1461929769.32212.python-list@python.org>
In reply to	#107791

Sayth Renshaw wrote:

> 
>> 
>> Your actual problem is drowned in too much source code. Can you restate
>> it in English, optionally with a few small snippets of Python?
>> 
>> It is not even clear what the code you provide should accomplish once
>> it's running as desired.
>> 
>> To give at least one code-related advice: You have a few repetitions of
>> the following structure
>> 
>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
>> 
>> You should move the pieces into a function that works for meetings,
>> clubs, races, and so on. Finally (If I am repeating myself so be it): the
>> occurence of range(len(something)) in your code is a strong indication
>> that you are not using Python the way Guido intended it. Iterate over the
>> `something` directly whenever possible.
> 
> Hi Peter
> 
>> meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')
> 
> is created to define a list of attr in the XML rather than referencing
> each attr individually I create a list and pass it into
> 
>  >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
> 
> This list comprehension reads the XML attr by attr using meet =
> d('meeting') as the call to pyquery to locate the class in the XML and
> identify the attr's.

You misunderstood me. I do understand what your code does, I just have no 
idea what you want to do, in terms of the domain, like e. g.

"Print horses with the last three races they took part in."

Why does this matter? Here's an extreme example:

bars = []
for foo in whatever:
   bars.append(foo.baz)

What does this do? The description

"It puts all baz attributes of the items in whatever into a list"

doesn't help. If you say "I want to make a list of all brands in the car 
park I could recommend a change to

brand = set(car.brand for car in car_park)

because a set avoids duplicates. If you say "I want to document my 
achievements for posterity" I would recommend that you print to a file 
rather than append to a list and the original code could be changed to

with open("somefile") as f:
    for achievement in my_achievements:
        print(achievement.description, file=f)


Back to my coding hint: Don't repeat yourself. If you move the pieces

>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]

into a function

def extract_attrs(nodes, attrs):
    return [[nodes.eq(i).attr(name) for name in attrs]
            for i in range(len(nodes))]

You can reuse it for clubs, races, etc.:

meetdata = extract_attrs(d("meeting"), meetattrs)
racedata = extract_attrs(d("race"), raceattrs)

If you put the parts into a dict you can generalize even further:

tables = {
   "meeting": ([], meetattrs),
   "race": ([], raceattrs),
}
for name, (data, attrs) in tables.items():
    data.extend(extract_attrs(d(name), attrs))

> 
> I do apologise for the lack of output, I asked a question about parsing
> that I always seem to get wrong over think and then find the solution
> simpler than I thought.
> 
> The output is 4 tables of the class and selected attributes eg meetattrs =
> ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition') from the
> meeting class of the XML.
> 
> In order to give flexibility and keep the relational nature they have
> defined in the table I found when exporting the nominations section via
> pandas to csv that i had no way to determine which id belonged to each
> race that is there was no race_id in the nominations to relate back to the
> race, and also no meeting id in the raceid to relate it back to the
> meeting.
> 
> 
> So I wanted to traverse all the classes Meeting, Race and Nomination and
> insert the id of the class into its direct children only and since there
> were many races a meeting and many nomnations a race I need to ensure that
> it is the direct children only.
> 
> It was otherwise working as parsed output in code supplied using to push
> to pandas and use its csv write capability.
> 
> So I inserted
> 
>     for race_el in d('race'):
>         race = pq(race_el)
>         race_id = race.attr('id')
> 
>     for nom_el in race.items('nomination'):
>         res.append((pq(nom_el).attr('raceid', race_id)))
> 
> which traverses and inserts the race_id into the child nominations.
> However, my boggles is how to pass this to the list comprehension that was
> working without changing the data from XML or creating another
> intermediate step and variable. Just to parse it as it was but with the
> extra included race_id.

So you want to go from a tree structure to a set of tables and preserve the 
structural information:

for meeting in meetings
    meeting_table.append(...meeting attributes...)
    meeting_id = ...
    for race in meeting.races:
        race_table.append(meeting_id, ...race attributes...)
        race_id = ...
        for nomination in race.nominations:
            nomination_table.append(race_id, ...nomination attributes...)

I don't know how to spell that in PyQuery, so here's how to do it with lxml:

meeting_table = []
race_table = []
nomination_table = []
tree = lxml.etree.parse(filename)
for meeting in tree.xpath("/meeting"):
    meeting_table.append([meeting.attrib[name] for name in meetattrs])
    meeting_id = meeting.attrib["id"]
    for race in meeting.xpath("./race"):
        race_table.append(
            [meeting_id] + [race.attrib[name] for name in raceattrs])
        race_id = race.attrib["id"]
        for nomination in race.xpath("./nomination"):
            nomination_table.append(
                [race_id]
                + [nomination.attrib[name] for name in horseattrs])

Not as clean and not as general as I would hope -- basically I'm neglecting 
my recommendations from above -- but if it works for you I might have a 
second look later.

[toc] | [prev] | [next] | [standalone]

#107793

From	Dan Strohl <D.Strohl@F5.com>
Date	2016-04-28 14:40 +0000
Message-ID	<mailman.190.1461854474.32212.python-list@python.org>
In reply to	#107781

In addition to Peter's points, 
- I would suggest breaking out the list comprehensions into standard for loops and/or functions.  That makes it easier to read and troubleshoot.  (you can always re-optimize It if needed.)
- Peter's point about making things into functions will also help troubleshooting.  You can better test the data going into and out of the function.  In my code, I will often have the file access and data processing as separate functions, then the main routine just calls the function to get data, passes that data to a function that manipulates it, and then pass the results to a function that writes it out.  This allows for much easier testing and troubleshooting of the individual functions.  (sometimes I will reassemble them into one function when I am done, depending on my needs)

More importantly though in terms of your getting help for your problem, the post is unclear (to me at least) in terms of what you are trying to achieve and what isn't working, 

Try considering the following suggestions:

- It is unclear what the problem is that you are having, you say you are trying to do x, and you are having problems in a part after a comment, but none of the comments say "this is breaking".  I assume you are talking about the comment that starts "Here I need to traverse", but don't know for sure, and even if it is, you don't specify what the problem actually is;  are you receiving an exception message (please let us know what it is), is it running, but not doing anything?  is it returning incorrect data? or???

- When posting, rather than commenting out code that you aren't using right now, and code that is working and not related to the problem, I recommend just deleting them so that people don't have to try to work through it.... for example, just remove the GetArgs function and just say fileList = "/xml_dir", and the section at the end that is all commented out, just remove it.  You should just have the minimum needed to replicate the problem.  

This will also help in troubleshooting, when I have a problem like this, and I can't figure out what is going on, I will copy the code to a new file and make a program that will handle a specific set of data, and try to do the one thing that is breaking, removing all the rest of the stuff, and test the results.. so, for example, copy a sample of the xml with a couple of data items into a string var, and have a program that processes that and checks to see if at the end you end up with a list of the right values by printing a list rather than muddying the waters with file access, and writing out csv's.  (by the way, this is a great time to start working with unit testing if you aren't already, it is simple to create this as a test case and you will find that if you start doing testing along the way, the time it takes to troubleshoot errors along the way will go down dramatically.)

- Be clear at the end what you expect to get, especially if it is not what you are getting... so, either in the code as a comment, or in a descriptive paragraph, have a section that said something like:  "At the end of the snippet, meetdata, racedata, clubdata, and raceid should be a list of dictionaries with the data from the xml" (or whatever... possibly with an example of what you would expect).  This is even more important if the problem you are having is that the code is not returning correct data.  This may not be as needed if the code is simply blowing up at line xx, though it would still help people understand your goal.

- For the example at least (you may choose to do differently in your live code), use nice explanatory variable names and don't rename imports, so it would be clearer to say "import pandas", then "frames = pandas.DataFrame[])".  That way the reader doesn't have to keep referring to the imports to figure out what is going on.

Remember, you are asking a large number of people for help, most of which are pretty busy already, so the more you can do to simplify and show the exact problem, the more (and more useful) help you are likely to receive.  To this lists credit, even if you are completely unclear in your question, you will likely get *something* back, (as you saw with Peters response), but what you get back is more likely to be a general suggestion rather than a specific fix for your problem.

Dan Strohl




> -----Original Message-----
> From: Python-list [mailto:python-list-bounces+d.strohl=f5.com@python.org]
> On Behalf Of Peter Otten
> Sent: Thursday, April 28, 2016 6:40 AM
> To: python-list@python.org
> Subject: Re: Controlling the passing of data
> 
> Sayth Renshaw wrote:
> 
> > In my file here I needed to traverse and modify the XML file I don't
> > want to restore it or put it in a new variable or other format I just
> > want to alter it and let it flow onto the list comprehensions as they were.
> 
> That looks like an arbitrary limitation to me. It's a bit like "I want to repair my
> car with this big hammer".
> 
> > In particular here I am taking the id from race and putting it into
> > the children of each race called nomination.
> >
> > I have put a comment above the new code which is causing the difficulty.
> 
> Your actual problem is drowned in too much source code. Can you restate it
> in English, optionally with a few small snippets of Python?
> 
> It is not even clear what the code you provide should accomplish once it's
> running as desired.
> 
> To give at least one code-related advice: You have a few repetitions of the
> following structure
> 
> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
> > 'trackcondition')
> 
> >     meet = d('meeting')
> 
> >     meetdata = [[meet.eq(i).attr(x)
> >                  for x in meetattrs] for i in range(len(meet))]
> 
> You should move the pieces into a function that works for meetings, clubs,
> races, and so on. Finally (If I am repeating myself so be it): the occurence of
> range(len(something)) in your code is a strong indication that you are not
> using Python the way Guido intended it. Iterate over the `something` directly
> whenever possible.
> 
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]

#107794

From	Rustom Mody <rustompmody@gmail.com>
Date	2016-04-28 07:52 -0700
Message-ID	<e338bac6-d31b-468f-a115-d4f788f094f8@googlegroups.com>
In reply to	#107793

On Thursday, April 28, 2016 at 8:11:26 PM UTC+5:30, Dan Strohl wrote:
> In addition to Peter's points, 
> - I would suggest breaking out the list comprehensions into standard for loops and/or functions.  That makes it easier to read and troubleshoot.  (you can always re-optimize It if needed.)
> - Peter's point about making things into functions will also help troubleshooting.  You can better test the data going into and out of the function.  In my code, I will often have the file access and data processing as separate functions, then the main routine just calls the function to get data, passes that data to a function that manipulates it, and then pass the results to a function that writes it out.  This allows for much easier testing and troubleshooting of the individual functions.  (sometimes I will reassemble them into one function when I am done, depending on my needs)
> 
> More importantly though in terms of your getting help for your problem, the post is unclear (to me at least) in terms of what you are trying to achieve and what isn't working, 
> 
> Try considering the following suggestions:
> 
> - It is unclear what the problem is that you are having, you say you are trying to do x, and you are having problems in a part after a comment, but none of the comments say "this is breaking".  I assume you are talking about the comment that starts "Here I need to traverse", but don't know for sure, and even if it is, you don't specify what the problem actually is;  are you receiving an exception message (please let us know what it is), is it running, but not doing anything?  is it returning incorrect data? or???
> 
> - When posting, rather than commenting out code that you aren't using right now, and code that is working and not related to the problem, I recommend just deleting them so that people don't have to try to work through it.... for example, just remove the GetArgs function and just say fileList = "/xml_dir", and the section at the end that is all commented out, just remove it.  You should just have the minimum needed to replicate the problem.  
> 
> This will also help in troubleshooting, when I have a problem like this, and I can't figure out what is going on, I will copy the code to a new file and make a program that will handle a specific set of data, and try to do the one thing that is breaking, removing all the rest of the stuff, and test the results.. so, for example, copy a sample of the xml with a couple of data items into a string var, and have a program that processes that and checks to see if at the end you end up with a list of the right values by printing a list rather than muddying the waters with file access, and writing out csv's.  (by the way, this is a great time to start working with unit testing if you aren't already, it is simple to create this as a test case and you will find that if you start doing testing along the way, the time it takes to troubleshoot errors along the way will go down dramatically.)
> 
> - Be clear at the end what you expect to get, especially if it is not what you are getting... so, either in the code as a comment, or in a descriptive paragraph, have a section that said something like:  "At the end of the snippet, meetdata, racedata, clubdata, and raceid should be a list of dictionaries with the data from the xml" (or whatever... possibly with an example of what you would expect).  This is even more important if the problem you are having is that the code is not returning correct data.  This may not be as needed if the code is simply blowing up at line xx, though it would still help people understand your goal.
> 
> - For the example at least (you may choose to do differently in your live code), use nice explanatory variable names and don't rename imports, so it would be clearer to say "import pandas", then "frames = pandas.DataFrame[])".  That way the reader doesn't have to keep referring to the imports to figure out what is going on.
> 
> Remember, you are asking a large number of people for help, most of which are pretty busy already, so the more you can do to simplify and show the exact problem, the more (and more useful) help you are likely to receive.  To this lists credit, even if you are completely unclear in your question, you will likely get *something* back, (as you saw with Peters response), but what you get back is more likely to be a general suggestion rather than a specific fix for your problem.
> 
> Dan Strohl
> 
> 
> 
> 
> > -----Original Message-----
> > 
> > Sayth Renshaw wrote:
> > 
> > > In my file here I needed to traverse and modify the XML file I don't
> > > want to restore it or put it in a new variable or other format I just
> > > want to alter it and let it flow onto the list comprehensions as they were.
> > 
> > That looks like an arbitrary limitation to me. It's a bit like "I want to repair my
> > car with this big hammer".
> > 
> > > In particular here I am taking the id from race and putting it into
> > > the children of each race called nomination.
> > >
> > > I have put a comment above the new code which is causing the difficulty.
> > 
> > Your actual problem is drowned in too much source code. Can you restate it
> > in English, optionally with a few small snippets of Python?
> > 
> > It is not even clear what the code you provide should accomplish once it's
> > running as desired.
> > 
> > To give at least one code-related advice: You have a few repetitions of the
> > following structure
> > 
> > > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
> > > 'trackcondition')
> > 
> > >     meet = d('meeting')
> > 
> > >     meetdata = [[meet.eq(i).attr(x)
> > >                  for x in meetattrs] for i in range(len(meet))]
> > 
> > You should move the pieces into a function that works for meetings, clubs,
> > races, and so on. Finally (If I am repeating myself so be it): the occurence of
> > range(len(something)) in your code is a strong indication that you are not
> > using Python the way Guido intended it. Iterate over the `something` directly
> > whenever possible.
> > 
> > --
> > https://mail.python.org/mailman/listinfo/python-list

To add to that:
It is right to have a dislike for bad code and a desire for good code.
But this desire odes not translate into an effective communication. In particular:  I dont think anyone quite gets exactly you are after in:

> I don't want to restore it or put it in a new variable or other format I 
> just want to alter it and let it flow onto the list comprehensions as they were. 

So please cut down your example to 1/4 the size:
- Tiny inline triple-quoted xml
- Your attempt so far
- What you would like to do different

[toc] | [prev] | [standalone]

csiph-web

Controlling the passing of data

Contents

#107781 — Controlling the passing of data

#107790

#107791

#107799

#107837

#107850

#107855

#107851

#107793

#107794