Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #84738 > unrolled thread

parsing tree from excel sheet

Started byal.basili@gmail.com (alb)
First post2015-01-28 10:12 +0000
Last post2015-01-29 21:22 +0000
Articles 18 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-28 10:12 +0000
    Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-01-28 15:08 +0100
      Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-28 14:27 +0000
      Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-29 21:02 +0000
        Re: parsing tree from excel sheet MRAB <python@mrabarnett.plus.com> - 2015-01-29 21:16 +0000
          Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-29 21:32 +0000
            Re: parsing tree from excel sheet Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-01-29 21:59 +0000
            Re: parsing tree from excel sheet Chris Kaynor <ckaynor@zindagigames.com> - 2015-01-29 14:30 -0800
            Re: parsing tree from excel sheet Chris Angelico <rosuav@gmail.com> - 2015-01-30 10:46 +1100
      Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-30 15:05 +0000
        Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-01-30 18:11 +0100
          Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-31 22:45 +0000
            Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-02-01 11:11 +0100
        Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-01-30 18:24 +0100
        Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-01-31 10:07 +0100
        Re: parsing tree from excel sheet Peter Otten <__peter__@web.de> - 2015-01-31 10:07 +0100
    Re: parsing tree from excel sheet Tim Chase <python.list@tim.thechases.com> - 2015-01-28 08:13 -0600
      Re: parsing tree from excel sheet al.basili@gmail.com (alb) - 2015-01-29 21:22 +0000

#84738 — parsing tree from excel sheet

Fromal.basili@gmail.com (alb)
Date2015-01-28 10:12 +0000
Subjectparsing tree from excel sheet
Message-ID<cirqviF15qtU1@mid.individual.net>
Hi everyone,

I've a document structure which is extremely simple and represented on a 
spreadsheet in the following way (a made up example):

subsystem | chapter | section | subsection | subsubsec |
    A     |         |         |            |           |
          | func0   |         |            |           |
          |         |interface|            |           |
          |         |latency  |            |           |
          |         |priority |            |           |
          | func1   |         |            |           |
          |         |interface|            |           |
          |         |latency  |            |           |
          |         |priority |            |           |
          |         |depend   |            |           |
          |         |         | variables  |           |
          |         |         |            | static    |
          |         |         |            | global    |
          |         |         | functions  |           |
          |         |         |            | internal  |
          |         |         |            | external  |

And I'd like to get a tree like this:

    A
    +-------> func0
    |           +---> interface
    |           +---> latency
    |           \---> priority
    \-------> func1
                +---> interface
                +---> latency
                +---> priority
                \---> depend
                         +---> variables
                         |         +---> static
                         |         \---> local
                         \---> functions
                                   +---> internal
                                   \---> external

I know about the xlrd module to get data from excel and I'm also aware 
about the ETE toolkit (which is more specific for bioinformatics, but I 
guess can suitable fill the need).

Does anyone recommend any other path other than scripting through these 
two modules?

Is there any more suitable module/example/project out there that would 
achieve the same result?

The reason for parsing is because the need behind is to create documents 
edited in excel but typeset in LaTeX, therefore my script will spill out 
\chapter, \section and so forth based on the tree structure.

Every node will have some text and some images with a very light markup 
like mediawiki that I can easily convert into latex.

Hope I've not been too confusing.
Thanks for any pointer/suggestion/comment.

Al

p.s.: I'm not extremely proficient in python, actually I'm just starting 
with it!

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[toc] | [next] | [standalone]


#84755

FromPeter Otten <__peter__@web.de>
Date2015-01-28 15:08 +0100
Message-ID<mailman.18217.1422454096.18130.python-list@python.org>
In reply to#84738
alb wrote:

> Hi everyone,
> 
> I've a document structure which is extremely simple and represented on a
> spreadsheet in the following way (a made up example):
> 
> subsystem | chapter | section | subsection | subsubsec |
>     A     |         |         |            |           |
>           | func0   |         |            |           |
>           |         |interface|            |           |
>           |         |latency  |            |           |
>           |         |priority |            |           |
>           | func1   |         |            |           |
>           |         |interface|            |           |
>           |         |latency  |            |           |
>           |         |priority |            |           |
>           |         |depend   |            |           |
>           |         |         | variables  |           |
>           |         |         |            | static    |
>           |         |         |            | global    |
>           |         |         | functions  |           |
>           |         |         |            | internal  |
>           |         |         |            | external  |
> 
> And I'd like to get a tree like this:
> 
>     A
>     +-------> func0
>     |           +---> interface
>     |           +---> latency
>     |           \---> priority
>     \-------> func1
>                 +---> interface
>                 +---> latency
>                 +---> priority
>                 \---> depend
>                          +---> variables
>                          |         +---> static
>                          |         \---> local
>                          \---> functions
>                                    +---> internal
>                                    \---> external
> 
> I know about the xlrd module to get data from excel and I'm also aware
> about the ETE toolkit (which is more specific for bioinformatics, but I
> guess can suitable fill the need).
> 
> Does anyone recommend any other path other than scripting through these
> two modules?
> 
> Is there any more suitable module/example/project out there that would
> achieve the same result?
> 
> The reason for parsing is because the need behind is to create documents
> edited in excel but typeset in LaTeX, therefore my script will spill out
> \chapter, \section and so forth based on the tree structure.
> 
> Every node will have some text and some images with a very light markup
> like mediawiki that I can easily convert into latex.
> 
> Hope I've not been too confusing.
> Thanks for any pointer/suggestion/comment.
> 
> Al
> 
> p.s.: I'm not extremely proficient in python, actually I'm just starting
> with it!

You can save the excel sheet as csv so that you an use the csv module which 
may be easier to use than xlrd. The rest should be doable by hand. Here's 
what I hacked together:

$ cat parse_column_tree.py
import csv

def column_index(row):
    for result, cell in enumerate(row, 0):
        if cell:
            return result
    raise ValueError


class Node:
    def __init__(self, name, level):
        self.name = name
        self.level = level
        self.children = []

    def append(self, child):
        self.children.append(child)

    def __str__(self):
        return "\%s{%s}" % (self.level, self.name)

    def show(self):
        yield [self.name]
        for i, child in enumerate(self.children):
            lastchild = i == len(self.children)-1
            first = True
            for c in child.show():
                if first:
                    yield ["\---> " if lastchild else "+---> "] + c
                    first = False
                else:
                    yield ["      " if lastchild else "|     "] + c
    def show2(self):
        yield str(self)
        for child in self.children:
            yield from child.show2()

def show(root):
    for row in root.show():
        print("".join(row))

def show2(root):
    for line in root.show2():
        print(line)

def read_tree(rows, levelnames):
    root = Node("#ROOT", "#ROOT")
    old_level = 0
    stack = [root]
    for i, row in enumerate(rows, 1):

        new_level = column_index(row)
        node = Node(row[new_level], levelnames[new_level])

        if new_level == old_level:
            stack[-1].append(node)
        elif new_level > old_level:
            if new_level - old_level != 1:
                raise ValueError

            stack.append(stack[-1].children[-1])
            stack[-1].append(node)
            old_level = new_level
        else:
            while new_level < old_level:
                stack.pop(-1)
                old_level -= 1
            stack[-1].append(node)
    return root

def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("infile")
    parser.add_argument("--latex", action="store_true")

    args = parser.parse_args()

    with open(args.infile) as f:
        rows = csv.reader(f)
        levelnames = next(rows) # skip header
        tree = read_tree(rows, levelnames)

        show_tree = show2 if args.latex else show
        for node in tree.children:
            show_tree(node)
            print("")

if __name__ == "__main__":
    main()
$ cat data.csv
subsystem,chapter,section,subsection,subsubsec,
A,,,,,
,func0,,,,
,,interface,,,
,,latency,,,
,,priority,,,
,func1,,,,
,,interface,,,
,,latency,,,
,,priority,,,
,,depend,,,
,,,variables,,
,,,,static,
,,,,global,
,,,functions,,
,,,,internal,
,,,,external,
$ python3 parse_column_tree.py data.csv
A
+---> func0
|     +---> interface
|     +---> latency
|     \---> priority
\---> func1
      +---> interface
      +---> latency
      +---> priority
      \---> depend
            +---> variables
            |     +---> static
            |     \---> global
            \---> functions
                  +---> internal
                  \---> external

$ python3 parse_column_tree.py data.csv --latex
\subsystem{A}
\chapter{func0}
\section{interface}
\section{latency}
\section{priority}
\chapter{func1}
\section{interface}
\section{latency}
\section{priority}
\section{depend}
\subsection{variables}
\subsubsec{static}
\subsubsec{global}
\subsection{functions}
\subsubsec{internal}
\subsubsec{external}





[toc] | [prev] | [next] | [standalone]


#84758

Fromal.basili@gmail.com (alb)
Date2015-01-28 14:27 +0000
Message-ID<cis9uqF5ddpU1@mid.individual.net>
In reply to#84755
Hi Peter,

Peter Otten <__peter__@web.de> wrote:
[]
> You can save the excel sheet as csv so that you an use the csv module which 
> may be easier to use than xlrd. The rest should be doable by hand. Here's 
> what I hacked together:
> 
> $ cat parse_column_tree.py
> import csv
> 
> def column_index(row):
>    for result, cell in enumerate(row, 0):
>        if cell:
>            return result
>    raise ValueError
> 
> 
> class Node:
>    def __init__(self, name, level):
>        self.name = name
>        self.level = level
>        self.children = []
> 
>    def append(self, child):
>        self.children.append(child)
> 
>    def __str__(self):
>        return "\%s{%s}" % (self.level, self.name)
> 
>    def show(self):
>        yield [self.name]
>        for i, child in enumerate(self.children):
>            lastchild = i == len(self.children)-1
>            first = True
>            for c in child.show():
>                if first:
>                    yield ["\---> " if lastchild else "+---> "] + c
>                    first = False
>                else:
>                    yield ["      " if lastchild else "|     "] + c
>    def show2(self):
>        yield str(self)
>        for child in self.children:
>            yield from child.show2()
> 
> def show(root):
>    for row in root.show():
>        print("".join(row))
> 
> def show2(root):
>    for line in root.show2():
>        print(line)
> 
> def read_tree(rows, levelnames):
>    root = Node("#ROOT", "#ROOT")
>    old_level = 0
>    stack = [root]
>    for i, row in enumerate(rows, 1):
> 
>        new_level = column_index(row)
>        node = Node(row[new_level], levelnames[new_level])
> 
>        if new_level == old_level:
>            stack[-1].append(node)
>        elif new_level > old_level:
>            if new_level - old_level != 1:
>                raise ValueError
> 
>            stack.append(stack[-1].children[-1])
>            stack[-1].append(node)
>            old_level = new_level
>        else:
>            while new_level < old_level:
>                stack.pop(-1)
>                old_level -= 1
>            stack[-1].append(node)
>    return root
> 
> def main():
>    import argparse
>    parser = argparse.ArgumentParser()
>    parser.add_argument("infile")
>    parser.add_argument("--latex", action="store_true")
> 
>    args = parser.parse_args()
> 
>    with open(args.infile) as f:
>        rows = csv.reader(f)
>        levelnames = next(rows) # skip header
>        tree = read_tree(rows, levelnames)
> 
>        show_tree = show2 if args.latex else show
>        for node in tree.children:
>            show_tree(node)
>            print("")
> 
> if __name__ == "__main__":
>    main()
> $ cat data.csv
> subsystem,chapter,section,subsection,subsubsec,
> A,,,,,
> ,func0,,,,
> ,,interface,,,
> ,,latency,,,
> ,,priority,,,
> ,func1,,,,
> ,,interface,,,
> ,,latency,,,
> ,,priority,,,
> ,,depend,,,
> ,,,variables,,
> ,,,,static,
> ,,,,global,
> ,,,functions,,
> ,,,,internal,
> ,,,,external,
> $ python3 parse_column_tree.py data.csv
> A
> +---> func0
> |     +---> interface
> |     +---> latency
> |     \---> priority
> \---> func1
>      +---> interface
>      +---> latency
>      +---> priority
>      \---> depend
>            +---> variables
>            |     +---> static
>            |     \---> global
>            \---> functions
>                  +---> internal
>                  \---> external
> 
> $ python3 parse_column_tree.py data.csv --latex
> \subsystem{A}
> \chapter{func0}
> \section{interface}
> \section{latency}
> \section{priority}
> \chapter{func1}
> \section{interface}
> \section{latency}
> \section{priority}
> \section{depend}
> \subsection{variables}
> \subsubsec{static}
> \subsubsec{global}
> \subsection{functions}
> \subsubsec{internal}
> \subsubsec{external}

WOW! I didn't really want someone else to write what I needed but thanks 
a lot! That's a lot of food to digest in a single byte, so I'll first 
play a bit with it (hopefully understanding what is doing) and then come 
back with comments.

I really appreciated your time and effort.

Al

[toc] | [prev] | [next] | [standalone]


#84841

Fromal.basili@gmail.com (alb)
Date2015-01-29 21:02 +0000
Message-ID<civleiF21boU1@mid.individual.net>
In reply to#84755
Hi Peter,

Peter Otten <__peter__@web.de> wrote:
[]
>    def show2(self):
>        yield str(self)
>        for child in self.children:
>            yield from child.show2()

here is what I get:

> SyntaxError: invalid syntax
> debian@debian:example$ python3 export_latex.py doctree.csv 
>   File "export_latex.py", line 36
>     yield from child.show2()
>              ^
> SyntaxError: invalid syntax

and I've tried with both python and python3 (see below versions).

debian@debian:example$ python
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
debian@debian:example$ python3
Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Is it an issue related to my installation? Shall I upgrade and/or 
downgrade?

Thanks for any hint,

Al

[toc] | [prev] | [next] | [standalone]


#84844

FromMRAB <python@mrabarnett.plus.com>
Date2015-01-29 21:16 +0000
Message-ID<mailman.18282.1422566221.18130.python-list@python.org>
In reply to#84841
On 2015-01-29 21:02, alb wrote:
> Hi Peter,
>
> Peter Otten <__peter__@web.de> wrote:
> []
>>    def show2(self):
>>        yield str(self)
>>        for child in self.children:
>>            yield from child.show2()
>
> here is what I get:
>
>> SyntaxError: invalid syntax
>> debian@debian:example$ python3 export_latex.py doctree.csv
>>   File "export_latex.py", line 36
>>     yield from child.show2()
>>              ^
>> SyntaxError: invalid syntax
>
> and I've tried with both python and python3 (see below versions).
>
> debian@debian:example$ python
> Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
> debian@debian:example$ python3
> Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>
> Is it an issue related to my installation? Shall I upgrade and/or
> downgrade?
>
"yield from" was introduced in Python 3.3.

[toc] | [prev] | [next] | [standalone]


#84846

Fromal.basili@gmail.com (alb)
Date2015-01-29 21:32 +0000
Message-ID<civn75F21boU3@mid.individual.net>
In reply to#84844
Hi MRAB,

MRAB <python@mrabarnett.plus.com> wrote:
[]
>>> SyntaxError: invalid syntax
>>> debian@debian:example$ python3 export_latex.py doctree.csv
>>>   File "export_latex.py", line 36
>>>     yield from child.show2()
>>>              ^
>>> SyntaxError: invalid syntax
>>
>> and I've tried with both python and python3 (see below versions).
>>
>> debian@debian:example$ python
>> Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
>> [GCC 4.4.5] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>
>> debian@debian:example$ python3
>> Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10)
>> [GCC 4.4.5] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>
>>
>> Is it an issue related to my installation? Shall I upgrade and/or
>> downgrade?
>>
> "yield from" was introduced in Python 3.3.
> 

Ok, that either means I need to upgrade to 3.3 or need to modify the 
snippet to a suitable syntax that would work with other versions.

Considering that upgrading is something that I'm not keen to do on my 
production system I believe I've only have one available choice.

It seems I could use the generator and iterate with .next() in python 
2.6, at least from what I found here:
http://stackoverflow.com/questions/1756096/understanding-generators-in-python

Al

[toc] | [prev] | [next] | [standalone]


#84847

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-01-29 21:59 +0000
Message-ID<mailman.18283.1422568807.18130.python-list@python.org>
In reply to#84846
On 29/01/2015 21:32, alb wrote:
> Hi MRAB,
>
> MRAB <python@mrabarnett.plus.com> wrote:
> []
>>>> SyntaxError: invalid syntax
>>>> debian@debian:example$ python3 export_latex.py doctree.csv
>>>>    File "export_latex.py", line 36
>>>>      yield from child.show2()
>>>>               ^
>>>> SyntaxError: invalid syntax
>>>
>>> and I've tried with both python and python3 (see below versions).
>>>
>>> debian@debian:example$ python
>>> Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
>>> [GCC 4.4.5] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>
>>> debian@debian:example$ python3
>>> Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10)
>>> [GCC 4.4.5] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>
>>>
>>> Is it an issue related to my installation? Shall I upgrade and/or
>>> downgrade?
>>>
>> "yield from" was introduced in Python 3.3.
>>
>
> Ok, that either means I need to upgrade to 3.3 or need to modify the
> snippet to a suitable syntax that would work with other versions.
>
> Considering that upgrading is something that I'm not keen to do on my
> production system I believe I've only have one available choice.
>
> It seems I could use the generator and iterate with .next() in python
> 2.6, at least from what I found here:
> http://stackoverflow.com/questions/1756096/understanding-generators-in-python
>
> Al
>

I'd be inclined to upgrade, see here 
https://www.python.org/dev/peps/pep-0380/#formal-semantics for why :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#84850

FromChris Kaynor <ckaynor@zindagigames.com>
Date2015-01-29 14:30 -0800
Message-ID<mailman.18284.1422570630.18130.python-list@python.org>
In reply to#84846
On Thu, Jan 29, 2015 at 1:59 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> On 29/01/2015 21:32, alb wrote:
>>
>> Hi MRAB,
>>
>> MRAB <python@mrabarnett.plus.com> wrote:
>> []
>>>>>
>>>>> SyntaxError: invalid syntax
>>>>> debian@debian:example$ python3 export_latex.py doctree.csv
>>>>>    File "export_latex.py", line 36
>>>>>      yield from child.show2()
>>>>>               ^
>>>>> SyntaxError: invalid syntax
>>>>
>>>>
>>>> and I've tried with both python and python3 (see below versions).
>>>>
>>>> debian@debian:example$ python
>>>> Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
>>>> [GCC 4.4.5] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>>
>>>>>>>
>>>> debian@debian:example$ python3
>>>> Python 3.1.3 (r313:86834, Nov 28 2010, 11:28:10)
>>>> [GCC 4.4.5] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>>
>>>>>>>
>>>>
>>>> Is it an issue related to my installation? Shall I upgrade and/or
>>>> downgrade?
>>>>
>>> "yield from" was introduced in Python 3.3.
>>>
>>
>> Ok, that either means I need to upgrade to 3.3 or need to modify the
>> snippet to a suitable syntax that would work with other versions.
>>
>> Considering that upgrading is something that I'm not keen to do on my
>> production system I believe I've only have one available choice.
>>
>> It seems I could use the generator and iterate with .next() in python
>> 2.6, at least from what I found here:
>>
>> http://stackoverflow.com/questions/1756096/understanding-generators-in-python
>>
>> Al
>>
>
> I'd be inclined to upgrade, see here
> https://www.python.org/dev/peps/pep-0380/#formal-semantics for why :)

While that is true, most of that code is needed to handle the odd
corner cases and exceptions that could happen, as well as supporting
generator.throw and generator.send, while also ensuring proper and
quick clean-up of the objects.

>From what I could see at a quick glance, none of that is really needed
in the simple case in the posted code, and as such, it is LIKELY safe
to just replace "yield from ..." with "for item in ...: yield item".

Chris

[toc] | [prev] | [next] | [standalone]


#84857

FromChris Angelico <rosuav@gmail.com>
Date2015-01-30 10:46 +1100
Message-ID<mailman.18290.1422575211.18130.python-list@python.org>
In reply to#84846
On Fri, Jan 30, 2015 at 8:32 AM, alb <al.basili@gmail.com> wrote:
> Ok, that either means I need to upgrade to 3.3 or need to modify the
> snippet to a suitable syntax that would work with other versions.

You could replace "yield from child.show2()" with:

for val in child.show2(): yield val

and it should work. However, you're running Python 3.1, and a *lot* of
improvements have been made since then, so it's well worth upgrading.

ChrisA

[toc] | [prev] | [next] | [standalone]


#84890

Fromal.basili@gmail.com (alb)
Date2015-01-30 15:05 +0000
Message-ID<cj1kt9Fi3e0U1@mid.individual.net>
In reply to#84755
Hi Peter, I'll try to comment the code below to verify if I understood 
it correctly or missing some major parts. Comments are just below code 
with the intent to let you read the code first and my understanding 
afterwards.

Peter Otten <__peter__@web.de> wrote:
[]
> $ cat parse_column_tree.py
> import csv
> 
> def column_index(row):
>    for result, cell in enumerate(row, 0):
>        if cell:
>            return result
>    raise ValueError

Here you get the depth of your first node in this row.

> class Node:
>    def __init__(self, name, level):
>        self.name = name
>        self.level = level
>        self.children = []
> 
>    def append(self, child):
>        self.children.append(child)
> 
>    def __str__(self):
>        return "\%s{%s}" % (self.level, self.name)

Up to here everything is fine, essentially defining the basic methods 
for the node object. A node is represented univocally with its name and 
the level. Here I could say that two nodes with the same name cannot be 
on the same level but this is cosmetic.

The important part would be that 'Name' can be also 'Attributes', with a 
dictionary instead. This would allow to store more information on each 
node.

>    def show(self):
>        yield [self.name]

Here I'm lost in translation! Why using yield in the first place?
What this snippet is used for?


>        for i, child in enumerate(self.children):
>            lastchild = i == len(self.children)-1
>            first = True
>            for c in child.show():
>                if first:
>                    yield ["\---> " if lastchild else "+---> "] + c
>                    first = False
>                else:
>                    yield ["      " if lastchild else "|     "] + c

Here I understand more, essentially 'yield' returns a string that would 
be used further down in the show(root) function. Yet I doubt that I 
grasp the true meaning of the code. It seems those 'show' functions have 
lots of iterations that I'm not quite able to trace. Here you loop over 
children, as well as in the main()...

>    def show2(self):
>        yield str(self)
>        for child in self.children:
>            yield from child.show2()

ok, this as well requires some explanation. Kinda lost again. From what 
I can naively deduce is that it is a generator that returns the str 
defined in the node as __str__ and it shows it for the whole tree.

> def show(root):
>    for row in root.show():
>        print("".join(row))
> 
> def show2(root):
>    for line in root.show2():
>        print(line)

Here we implement the functions to print a node, but I'm not sure I 
understand why do I have to iterate if the main() iterates again over the 
nodes.

> 
> def read_tree(rows, levelnames):
>    root = Node("#ROOT", "#ROOT")
>    old_level = 0
>    stack = [root]
>    for i, row in enumerate(rows, 1):

I'm not quite sure I understand what is the stack for. As of now is a 
list whose only element is root.

>        new_level = column_index(row)
>        node = Node(row[new_level], levelnames[new_level])

here you are getting the node based on the current row, with its level.

>        if new_level == old_level:
>            stack[-1].append(node)

I'm not sure I understand here. Why the end of the list and not the 
beginning?

>        elif new_level > old_level:
>            if new_level - old_level != 1:
>                raise ValueError

here you avoid having a node which is distant more than one level from 
its parent.

>            stack.append(stack[-1].children[-1])

here I get a crash: IndexError: list index out of range!

>            stack[-1].append(node)
>            old_level = new_level
>        else:
>            while new_level < old_level:
>                stack.pop(-1)
>                old_level -= 1
>            stack[-1].append(node)

Why do I need to pop something from the stack??? Here you are saying 
that if current row has a depth (new_level) that is smaller than the 
previous one (old_level) I decrement by one the old_level (even if I may 
have a bigger jump) and pop something from the stack...???

>    return root

once filled, the tree is returned. I thought the tree would have been 
the stack, but instead is root...nice surprise.

> 
> def main():
[strip arg parsing]

>    with open(args.infile) as f:
>        rows = csv.reader(f)
>        levelnames = next(rows) # skip header
>        tree = read_tree(rows, levelnames)

filling the tree with the data in the csv.

> 
>        show_tree = show2 if args.latex else show
>        for node in tree.children:
>            show_tree(node)
>            print("")

It's nice to define show_tree as a function of the argument. The for 
loop now is more than clear, traversing each node of the tree.

As I said earlier in the thread there's a lot of food for a newbie, but 
better going through these sort of exercises than dumb tutorial which 
don't teach you much.

Al

[toc] | [prev] | [next] | [standalone]


#84904

FromPeter Otten <__peter__@web.de>
Date2015-01-30 18:11 +0100
Message-ID<mailman.18314.1422637907.18130.python-list@python.org>
In reply to#84890
alb wrote:

> Hi Peter, I'll try to comment the code below to verify if I understood
> it correctly or missing some major parts. Comments are just below code
> with the intent to let you read the code first and my understanding
> afterwards.

Let's start with the simplest:
 
> Peter Otten <__peter__@web.de> wrote:

>>    def show2(self):
>>        yield str(self)
>>        for child in self.children:
>>            yield from child.show2()
> 
> ok, this as well requires some explanation. Kinda lost again. From what
> I can naively deduce is that it is a generator that returns the str
> defined in the node as __str__ and it shows it for the whole tree.

Given a tree

A --> A1
      A2 --> A21
             A22
      A3

assume a slightly modified show2():

def append_nodes(node, nodes):
    nodes.append(node)
    for child in node.children:
        append_nodes(child, nodes)

When you invoke this with the root node in the above sample tree and an 
empty list

nodes = []
append_nodes(A, nodes)

the first thing it will do is append the root node to the nodes list

[A]

Then it iterates over A's children:

append_nodes(A1, nodes) will append A1 and return immediately because A1 
itself has not children.

[A, A1]

append_nodes(A2, nodes) will append A2 and then iterate over A2's children.
As A21 and A22 don't have any children append_nodes(A21, nodes) and 
append_nodes(A22, nodes) will just append the respective node with no 
further nested ("recursive") invocation, and thus the list is now

[A, A1, A21, A22]

Finally the append_nodes(A3, nodes) will append A3 and then return because 
it has no children, and we end up with

nodes = [A, A1, A21, A22, A3]

Now why the generator? For such a small problem it doesn't matter, for large 
datasets it is convenient that you can process the first item immmediately, 
when the following ones may not yet be available. It also becomes easier to 
implement different treatment of the items or to stop in the process:

for deer in hunt():
    kill(deer)
    if have_enough_food():
        break

for animal in hunt():
    take_photograph(animal)
    if not_enough_light():
        break

Also, you never need more than one item in memory instead of the whole list 
for many problems.

Ok, how to get from the recursive list building to yielding nodes as they 
are encountered? The basic process is always the same:

def f(items)
   items.append(3)
   items.append(6)
   for i in range(10):
       items.append(i)

items = []
f(items)
for item in items:
   print(item)

becomes

def g():
    yield 3
    yield 6
    for i in range(10):
        yield i

for item in g():
    print(items)

In Python 3.3 there was added some syntactic sugar so that you can write

def g():
    yield 3
    yield 6
    yield from range(10)


Thus

def append_nodes(node, nodes):
    nodes.append(node)
    for child in node.children:
        append_nodes(child, nodes)


becomes

def generate_nodes(node):
    yield node
    for child in node.children:
        yield from generate_nodes(child)

This looks a lot like show2() except that it's not a method and thus the 
node not called self and that the node itself is yielded rather than 
str(node). The latter makes the function a bit more flexible and is what I 
should have done in the first place.

The show() method is basically the the same, but there are varying prefixes 
before the node name. Here's a simpler variant that just adds some 
indentation. We start with generate_nodes() without the syntactic sugar. 
This is because we need a name for the nodes yielded from the nested 
generator call so that we can modify them:

def indented_nodes(node):
    yield node
    for child in node.children:
        for desc in from indented_nodes(child):
            yield desc

Now let's modify the yielded nodes:

def indented_nodes(node):
    yield [node]
    for child in node.children:
        for desc in indented_nodes(child):
            yield ["***"] + desc

How does it fare on the example tree? 

A --> A1
      A2 --> A21
             A22
      A3

The lists will have an "***" entry for every nesting level, so we get

[A]
["***", A1]
["***", A2]
["***", "***", A21]
["***", "***", A22]
["***", A3]

With "".join() we can print it nicely:

for item in indented_nodes(tree):
    print("".join(item))

But wait, "".join() only accepts strings so let's change

    yield [node]

to 
    yield [node.name] # str(node) would also work

A
***A1
***A2
******A21
******A22
***A3

>> def show2(root):
>>    for line in root.show2():
>>        print(line)

> Here we implement the functions to print a node, but I'm not sure I 
> understand why do I have to iterate if the main() iterates again over the 
> nodes.

Your example had the structure

A
 A1
   A11
   A12
 A2

and I was unsure if there could be data files that have multiple root nodes, 
e. g.

A
 A1
   A11
   A12
 A2
B
 B1
 B2

To simplify the handling of these I introduced an artificial root R

R
 A
  A1
    A11
    A12
  A2
 B
  B1
  B2

which makes all toplevel nodes in the data file children of R. In the
main() function I iterate over R's children to hide R from the user.

You can replace

        for node in tree.children:
            show_tree(node)
            print("")

in my original code with

        show_tree(tree)

to see the hidden node.

I may address the rest of your post later unless someone else does. In the 
mean time, can you please provide the data file that triggers the IndexError 
to help me with the debugging?

[toc] | [prev] | [next] | [standalone]


#84966

Fromal.basili@gmail.com (alb)
Date2015-01-31 22:45 +0000
Message-ID<cj547rFfl2sU1@mid.individual.net>
In reply to#84904
Hi Peter,

Peter Otten <__peter__@web.de> wrote:
[]
> Let's start with the simplest:
> 
>> Peter Otten <__peter__@web.de> wrote:
> 
>>>    def show2(self):
>>>        yield str(self) 
>>>        for child in self.children:
>>>            yield from child.show2()
[]
> 
> Given a tree
> 
> A --> A1
>      A2 --> A21
>             A22
>      A3
> 
> assume a slightly modified show2():
> 
> def append_nodes(node, nodes):
>    nodes.append(node)
>    for child in node.children:
>        append_nodes(child, nodes)

I'm assuming you are referring to the method in the Node class.

> 
> When you invoke this with the root node in the above sample tree and 
> an empty list
> 
> nodes = [] append_nodes(A, nodes)
> 
> the first thing it will do is append the root node to the nodes list
> 
> [A]
> 
> Then it iterates over A's children:
> 
> append_nodes(A1, nodes) will append A1 and return immediately because 
> A1 itself has not children.
> 
> [A, A1]
> 
> append_nodes(A2, nodes) will append A2 and then iterate over A2's 
> children. As A21 and A22 don't have any children append_nodes(A21, 
> nodes) and append_nodes(A22, nodes) will just append the respective 
> node with no further nested ("recursive") invocation, and thus the 
> list is now
> 
> [A, A1, A21, A22]
> 
> Finally the append_nodes(A3, nodes) will append A3 and then return 
> because it has no children, and we end up with
> 
> nodes = [A, A1, A21, A22, A3]

So the recursive function will append children as long as there are any, 
traversing the whole tree structure (yep, I saw the missing A2 in the 
list as you mentioned already).
 
> Now why the generator? For such a small problem it doesn't matter, for 
> large datasets it is convenient that you can process the first item 
> immmediately, when the following ones may not yet be available.

I've read something about generators and they are a strong concept 
(especially for a C-minded guy like me!).

[]
> Ok, how to get from the recursive list building to yielding nodes as 
> they are encountered? The basic process is always the same:
> 
> def f(items)
>   items.append(3)
>   items.append(6)
>   for i in range(10):
>       items.append(i)
> 
> items = []
> f(items)
> for item in items:
>   print(item)
> 
> becomes
> 
> def g():
>    yield 3
>    yield 6
>    for i in range(10):
>        yield i
> 
> for item in g():
>    print(items)
---------------^ should be item and not items.

> 
> In Python 3.3 there was added some syntactic sugar so that you can 
> write
> 
> def g():
>    yield 3
>    yield 6
>    yield from range(10)
> 
> 
> Thus
> 
> def append_nodes(node, nodes):
>    nodes.append(node)
>    for child in node.children:
>        append_nodes(child, nodes)
> 
> 
> becomes
> 
> def generate_nodes(node):
>    yield node
>    for child in node.children:
>        yield from generate_nodes(child)

I'm with you now! I guess it would have been nearly impossible to see 
the real picture behind.

> This looks a lot like show2() except that it's not a method and thus 
> the node not called self and that the node itself is yielded rather 
> than str(node). The latter makes the function a bit more flexible and 
> is what I should have done in the first place.

Indeed returning the node might be more useful than just yielding its 
string.

> 
> The show() method is basically the the same, but there are varying 
> prefixes before the node name. Here's a simpler variant that just adds 
> some indentation. We start with generate_nodes() without the syntactic 
> sugar. This is because we need a name for the nodes yielded from the 
> nested generator call so that we can modify them:
> 
> def indented_nodes(node):
>    yield node
>    for child in node.children:
>        for desc in from indented_nodes(child):
>            yield desc
> 
> Now let's modify the yielded nodes:
> 
> def indented_nodes(node):
>    yield [node]

why this line has changed from 'yield node'?

>    for child in node.children:
>        for desc in indented_nodes(child):
>            yield ["***"] + desc

Ok, the need for manipulation does not allow to use the syntax sugar of 
above.

> 
> How does it fare on the example tree? 
> 
> A --> A1
>      A2 --> A21
>             A22
>      A3
> 
> The lists will have an "***" entry for every nesting level, so we get
> 
> [A]
> ["***", A1]
> ["***", A2]
> ["***", "***", A21]
> ["***", "***", A22]
> ["***", A3]
> 
> With "".join() we can print it nicely:
> 
> for item in indented_nodes(tree):
>    print("".join(item))
> 
> But wait, "".join() only accepts strings so let's change
> 
>    yield [node]
> 
> to 
>    yield [node.name] # str(node) would also work

Again my question, why not simply yield node.name?

> A
> ***A1
> ***A2
> ******A21
> ******A22
> ***A3
> 
>>> def show2(root):
>>>    for line in root.show2():
>>>        print(line)
> 
>> Here we implement the functions to print a node, but I'm not sure I 
>> understand why do I have to iterate if the main() iterates again over the 
>> nodes.
> 
> Your example had the structure
> 
> A
> A1
>   A11
>   A12
> A2
> 
> and I was unsure if there could be data files that have multiple root 
> nodes, e. g.
> 
> A
> A1
>   A11
>   A12
> A2
> B
> B1
> B2
> 
> To simplify the handling of these I introduced an artificial root R
> 
> R
> A
>  A1
>    A11
>    A12
>  A2
> B
>  B1
>  B2
> 
> which makes all toplevel nodes in the data file children of R. In the
> main() function I iterate over R's children to hide R from the user.
> 
> You can replace
> 
>        for node in tree.children:
>            show_tree(node)
>            print("")
> 
> in my original code with
> 
>        show_tree(tree)
> 
> to see the hidden node.

That makes it more clear indeed. Indeed this is what it going to happen 
in reality since I'll have several subsystems at the very same level, 
therefore all children of a #ROOT node.

> 
> I may address the rest of your post later unless someone else does. In 
> the mean time, can you please provide the data file that triggers the 
> IndexError to help me with the debugging?

There was a mistake in my file indeed, now that I fixed it everything 
works!

Al

[toc] | [prev] | [next] | [standalone]


#84992

FromPeter Otten <__peter__@web.de>
Date2015-02-01 11:11 +0100
Message-ID<mailman.18356.1422785505.18130.python-list@python.org>
In reply to#84966
alb wrote:

>> But wait, "".join() only accepts strings so let's change
>>
>>yield [node]
>>
>> to
>>yield [node.name] # str(node) would also work
> 
> Again my question, why not simply yield node.name?

I've been conditioned to build a string from many substrings like so

>>> parts = ["foo", "bar", "baz"] 
>>> text = "".join(parts)
>>> text
'foobarbaz'

instead of

>>> text = ""
>>> for part in parts:
...     text += part
... 
>>> text
'foobarbaz'

mostly because ""join(...) was initially advertised as being faster than +=.

For nested generators or functions this translates into

>>> def outer_list():
...     return ["foo"] + inner_list()
... 
>>> def inner_list():
...     return ["bar"]
... 
>>> "".join(outer_list())
'foobar'

instead of the obvious

>>> def outer():
...     return "foo" + inner()
... 
>>> def inner():
...     return "bar"
... 
>>> outer()
'foobar'

Here the list-based approach may build many intermediate throwaway-lists, so 
it's most likely less efficient than direct string concatenation. In return 
it gives you flexibility:

>>> "/".join(outer_list())
'foo/bar'
>>> "-->".join(outer_list())
'foo-->bar'

You'd have to pass the separator as an argument to outer/inner() to achieve 
this with the seemingly simpler approach. But there's more:

You can reverse the order, 

>>> ":".join(reversed(outer_list()))
'bar:foo'

treat the innermost string differently, translate each part into a different 
language, clip common levels, etc. Your options are unlimited.

I never understood why the file system API operates on a single string...

[toc] | [prev] | [next] | [standalone]


#84905

FromPeter Otten <__peter__@web.de>
Date2015-01-30 18:24 +0100
Message-ID<mailman.18315.1422638708.18130.python-list@python.org>
In reply to#84890
Peter Otten wrote:

> [A, A1, A21, A22]
> 
> Finally the append_nodes(A3, nodes) will append A3 and then return because
> it has no children, and we end up with
> 
> nodes = [A, A1, A21, A22, A3]
 
Yay, proofreading! Both lists should contain A2:

[A, A1, A2, A21, A22]

nodes = [A, A1, A2, A21, A22, A3]

[toc] | [prev] | [next] | [standalone]


#84941

FromPeter Otten <__peter__@web.de>
Date2015-01-31 10:07 +0100
Message-ID<mailman.18332.1422695301.18130.python-list@python.org>
In reply to#84890
alb wrote:

> > 
> > def read_tree(rows, levelnames):
> >    root = Node("#ROOT", "#ROOT")
> >    old_level = 0
> >    stack = [root]
> >    for i, row in enumerate(rows, 1):
> 
> I'm not quite sure I understand what is the stack for. As of now is a 
> list whose only element is root.

The stack is used to emulate the function call stack in a recursive 
solution. Every nested call produces a new set of local variables and the 
innermost call corresponds to the top of the stack or the end of the `stack` 
list in my code.

Given a tree

A
  A1
    A11
    A12
  A2

or in another notation

0 A
1 A1
2 A11
2 A12
1 A2

we start with a stack (with the node's children in parens)

[A()]

then look at A1 and see that the level is one deeper than that of A and 
append it to A, the current TOS (top of stack)

[A(A1)]

Next is A11 which is two levels deeper than A, so we take the last child of 
A and put it on the stack, 

[A(A1), A1()]

then add A11 as a child to the new TOS

[A(A1), A1(A11)]

Next is A12 which is one level deeper than A1, so we add it to the current 
TOS

[A(A1), A1(A11, A12)]

Next is A2 which is one level higher than A1, so we keep removing nodes from 
the stack until the current TOS is one level higher than A2. Here we only 
have to remove A1 from the stack to get

[A(A1)]

and after adding A2 to the TOS

[A(A1, A2)]

I have ommitted the children of the children so far, but the actual 
structure is now

[A(A1(A11(), A12()), A2())]

Therefore I just need to return A which contains the complete layout of the 
tree.
 
> >        new_level = column_index(row)
> >        node = Node(row[new_level], levelnames[new_level])
> 
> here you are getting the node based on the current row, with its level.
> 
> >        if new_level == old_level:
> >            stack[-1].append(node)
> 
> I'm not sure I understand here. Why the end of the list and not the 
> beginning?
> 
> >        elif new_level > old_level:
> >            if new_level - old_level != 1:
> >                raise ValueError
> 
> here you avoid having a node which is distant more than one level from 
> its parent.
> 
> >            stack.append(stack[-1].children[-1])
> 
> here I get a crash: IndexError: list index out of range!
> 
> >            stack[-1].append(node)
> >            old_level = new_level
> >        else:
> >            while new_level < old_level:
> >                stack.pop(-1)
> >                old_level -= 1
> >            stack[-1].append(node)
> 
> Why do I need to pop something from the stack??? Here you are saying 
> that if current row has a depth (new_level) that is smaller than the 
> previous one (old_level) I decrement by one the old_level (even if I may 
> have a bigger jump) and pop something from the stack...???
> 
> >    return root
> 
> once filled, the tree is returned. I thought the tree would have been 
> the stack, but instead is root...nice surprise.


> Why do I need to pop something from the stack???

That got me thinking, and I found that my code is indeed much more complex 
than necessary. Sorry for that ;)

As a compensation here is the simplified non-popping version of read_tree():

def read_tree(rows, levelnames):
    root = Node("#ROOT", "#ROOT")
    parents = [root] + [None] * len(levelnames)

    for row in rows:
        level = column_index(row)
        node = Node(row[level], levelnames[level])

        parents[level].append(node)
        parents[level+1] = node

    return root


[toc] | [prev] | [next] | [standalone]


#84943

FromPeter Otten <__peter__@web.de>
Date2015-01-31 10:07 +0100
Message-ID<mailman.18334.1422695619.18130.python-list@python.org>
In reply to#84890
alb wrote:

> > 
> > def read_tree(rows, levelnames):
> >    root = Node("#ROOT", "#ROOT")
> >    old_level = 0
> >    stack = [root]
> >    for i, row in enumerate(rows, 1):
> 
> I'm not quite sure I understand what is the stack for. As of now is a 
> list whose only element is root.

The stack is used to emulate the function call stack in a recursive 
solution. Every nested call produces a new set of local variables and the 
innermost call corresponds to the top of the stack or the end of the `stack` 
list in my code.

Given a tree

A
  A1
    A11
    A12
  A2

or in another notation

0 A
1 A1
2 A11
2 A12
1 A2

we start with a stack (with the node's children in parens)

[A()]

then look at A1 and see that the level is one deeper than that of A and 
append it to A, the current TOS (top of stack)

[A(A1)]

Next is A11 which is two levels deeper than A, so we take the last child of 
A and put it on the stack, 

[A(A1), A1()]

then add A11 as a child to the new TOS

[A(A1), A1(A11)]

Next is A12 which is one level deeper than A1, so we add it to the current 
TOS

[A(A1), A1(A11, A12)]

Next is A2 which is one level higher than A1, so we keep removing nodes from 
the stack until the current TOS is one level higher than A2. Here we only 
have to remove A1 from the stack to get

[A(A1)]

and after adding A2 to the TOS

[A(A1, A2)]

I have ommitted the children of the children so far, but the actual 
structure is now

[A(A1(A11(), A12()), A2())]

Therefore I just need to return A which contains the complete layout of the 
tree.
 
> >        new_level = column_index(row)
> >        node = Node(row[new_level], levelnames[new_level])
> 
> here you are getting the node based on the current row, with its level.
> 
> >        if new_level == old_level:
> >            stack[-1].append(node)
> 
> I'm not sure I understand here. Why the end of the list and not the 
> beginning?
> 
> >        elif new_level > old_level:
> >            if new_level - old_level != 1:
> >                raise ValueError
> 
> here you avoid having a node which is distant more than one level from 
> its parent.
> 
> >            stack.append(stack[-1].children[-1])
> 
> here I get a crash: IndexError: list index out of range!
> 
> >            stack[-1].append(node)
> >            old_level = new_level
> >        else:
> >            while new_level < old_level:
> >                stack.pop(-1)
> >                old_level -= 1
> >            stack[-1].append(node)
> 
> Why do I need to pop something from the stack??? Here you are saying 
> that if current row has a depth (new_level) that is smaller than the 
> previous one (old_level) I decrement by one the old_level (even if I may 
> have a bigger jump) and pop something from the stack...???
> 
> >    return root
> 
> once filled, the tree is returned. I thought the tree would have been 
> the stack, but instead is root...nice surprise.


> Why do I need to pop something from the stack???

That got me thinking, and I found that my code is indeed much more complex 
than necessary. Sorry for that ;)

As a compensation here is the simplified non-popping version of read_tree():

def read_tree(rows, levelnames):
    root = Node("#ROOT", "#ROOT")
    parents = [root] + [None] * len(levelnames)

    for row in rows:
        level = column_index(row)
        node = Node(row[level], levelnames[level])

        parents[level].append(node)
        parents[level+1] = node

    return root


[toc] | [prev] | [next] | [standalone]


#84759

FromTim Chase <python.list@tim.thechases.com>
Date2015-01-28 08:13 -0600
Message-ID<mailman.18220.1422455440.18130.python-list@python.org>
In reply to#84738
On 2015-01-28 10:12, alb wrote:
> I've a document structure which is extremely simple and represented
> on a spreadsheet in the following way (a made up example):
> 
> subsystem | chapter | section | subsection | subsubsec |
>     A     |         |         |            |           |
>           | func0   |         |            |           |
>           |         |interface|            |           |
>           |         |latency  |            |           |
>           |         |priority |            |           |
>           | func1   |         |            |           |
>           |         |interface|            |           |
>           |         |latency  |            |           |
>           |         |priority |            |           |
> 
> And I'd like to get a tree like this:
> 
>     A
>     +-------> func0
>     |           +---> interface
>     |           +---> latency
>     |           \---> priority
>     \-------> func1
>                 +---> interface
>                 +---> latency
>                 +---> priority
> 
> I know about the xlrd module to get data from excel

If I have to get my code to read Excel files, xlrd is usually my
first and only stop.

> Does anyone recommend any other path other than scripting through
> these two modules?

Well, if you export from Excel as CSV, you can use the "csv" module
in the standard library.  This is actually my preferred route because
it prevents people (coughclientscough) from messing up the CSV file
with formatting, joined cells, and other weirdnesses that can choke
my utilities.

> Is there any more suitable module/example/project out there that
> would achieve the same result?

I don't believe there's anything that will natively do the work for
you.  Additionally, you'd have to clarify what should happen if two
rows in the same section had different sub-trees but the same
content/name.  Based on your use-case (LaTex export using these as
headers) I suspect you'd want a warning so you can repair the input
and re-run.  But it would be possible to default to either keeping or
squashing the duplicates.

> p.s.: I'm not extremely proficient in python, actually I'm just
> starting with it!

Well, you've come to the right place. Most of us are pretty fond of
Python here. :-)

-tkc



[toc] | [prev] | [next] | [standalone]


#84845

Fromal.basili@gmail.com (alb)
Date2015-01-29 21:22 +0000
Message-ID<civmkcF21boU2@mid.individual.net>
In reply to#84759
Hi Tim,

Tim Chase <python.list@tim.thechases.com> wrote:
[]
>> I know about the xlrd module to get data from excel
> 
> If I have to get my code to read Excel files, xlrd is usually my
> first and only stop.
> 

It provides quite a good interface to manipulating excel files and I 
find it pretty easy even for my entry level!

>> Does anyone recommend any other path other than scripting through
>> these two modules?
> 
> Well, if you export from Excel as CSV, you can use the "csv" module
> in the standard library.  This is actually my preferred route because
> it prevents people (coughclientscough) from messing up the CSV file
> with formatting, joined cells, and other weirdnesses that can choke
> my utilities.

In my case there's no such risk of manipulating the excel file. I'm in 
charge of it! :-) Sure it might at a later stage be misused and messed 
up inadvertedly, but we're just trying to validate an idea, i.e. writing 
specs without using any word processor.

I'm trying to bypass the need to go through a mark up language (to a 
certain point), in order to facilitate the transition from an 
unstructured approach to document writing to a more structured one.

I would have proposed SGML or XML and style sheets but unfortunately is 
hard to move from M$Word to XML (OMG I need to write code?!?!!). So to 
facilitate the transition to a structured approach I've come up with the 
idea to go through an automatic generation of documents using excel as a 
UI.

In a later stage with could move onto a full-fledged database and have 
simpler web access, but using the same backend for generating documents 
(i.e. some parser and latex).

>> Is there any more suitable module/example/project out there that
>> would achieve the same result?
> 
> I don't believe there's anything that will natively do the work for
> you.  Additionally, you'd have to clarify what should happen if two
> rows in the same section had different sub-trees but the same
> content/name.  Based on your use-case (LaTex export using these as
> headers) I suspect you'd want a warning so you can repair the input
> and re-run.  But it would be possible to default to either keeping or
> squashing the duplicates.

Sure, there are corner cases that might mess up the whole structure, 
which at the moment is not too fool proof, but I'm trying to test the 
idea and see what I can come up with. Once the flow is in place I could 
think over some more reliable approach as an interface to a database.
 
>> p.s.: I'm not extremely proficient in python, actually I'm just
>> starting with it!
> 
> Well, you've come to the right place. Most of us are pretty fond of
> Python here. :-)

I've never understood people discarding newsgroups in favor of more 
'recent' technologies like social networks. Long live the USENET!

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web