Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #8210 > unrolled thread
| Started by | Andy Barnes <andy.barnes@gmail.com> |
|---|---|
| First post | 2011-06-22 07:00 -0700 |
| Last post | 2011-06-22 21:50 -0700 |
| Articles | 5 — 5 participants |
Back to article view | Back to comp.lang.python
Python Regular Expressions Andy Barnes <andy.barnes@gmail.com> - 2011-06-22 07:00 -0700
Re: Python Regular Expressions Andy Barnes <andy.barnes@gmail.com> - 2011-06-22 07:26 -0700
Re: Python Regular Expressions Neil Cerutti <neilc@norwich.edu> - 2011-06-22 14:58 +0000
Re: Python Regular Expressions Peter Otten <__peter__@web.de> - 2011-06-22 17:05 +0200
Re: Python Regular Expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2011-06-22 21:50 -0700
| From | Andy Barnes <andy.barnes@gmail.com> |
|---|---|
| Date | 2011-06-22 07:00 -0700 |
| Subject | Python Regular Expressions |
| Message-ID | <3b65fd1f-d377-49b8-b00a-f80462a1213d@35g2000prp.googlegroups.com> |
Hi,
I am hoping someone here can help me with a problem I'd like to
resolve with Python. I have used it before for some other projects but
have never needed to use Regular Expressions before. It's quite
possible I am following completley the wrong tack for this task (so
any advice appreciated).
I have a source file in csv format that follows certain rules. I
basically want to parse the source file and spit out a second file
built from some rules and the content of the first file.
Source File Format:
Name, Type, Create, Study, Read, Teach, Prerequisite
# column headers
Distil Mana, Lore, n/a, 70, 38, 21
Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
Talismantic Lore, Lore, n/a, 150, 100, 50
Advanced Talismantic Lore, Lore, n/a, 100, 60, 30, Talismantic Lore,
Theurgic Lore
The input file I have has over 700 unique entries. I have tried to
cover the four main exceptions above. Before I detail them - this is
what I would like the above input file, to be output as (dot
diagramming language incase anyone recognises it):
Name, Type, Create, Study, Read, Teach, Prerequisite
# column headers
DistilMana [label="{ Distil Mana |{Type|7}|{70|38|21}}"];
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;
TalismanticLore [label="{ Talismantic Lore |{Lore|n/a}|{150|100|
50}}"];
AdvanvedTalismanticLore [label="{ Advanced Talismantic Lore |{Lore|n/
a}|{100|60|30}}"];
TalismanticLore -> AdvanvedTalismanticLore;
TheurgicLore -> AdvanvedTalismanticLore;
It's quite a complicated find and replace operation that can be broken
down into some easy stages. The main thing the sample above showed was
that some of the entries won't list any prerequisits - these only need
the descriptor entry creating. Some of them have more than one
prerequisite. A line is needed for each prerequisite listed, linking
it to it's parent.
You can also see that the 'name' needs to have spaces removed and it's
repeated a few times in the process. I Hope it's easy to see what I am
trying to achieve from the above. I'd be very happy to accept
assistance in automating the conversion of my ever expanding csv file,
into the dot format described above.
Andy
[toc] | [next] | [standalone]
| From | Andy Barnes <andy.barnes@gmail.com> |
|---|---|
| Date | 2011-06-22 07:26 -0700 |
| Message-ID | <77c4973c-7315-4b4d-8eae-3f5770dfb530@22g2000prx.googlegroups.com> |
| In reply to | #8210 |
to expand. I have parsed one of the lines manually to try and break
the process I'm trying to automate down.
source:
Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
output:
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;
This is the steps I would take to do this conversion manually:
1) Take everything prior to the first comma and remove all the spaces,
insert it into a newline:
TheurgicLore
2) append the following string ' [label="{ '
TheurgicLore [label="{
3) append everything prior to the first comma (this time we don't need
to remove the spaces)
TheurgicLore [label="{ Theurgic Lore
4) append the following string ' |{'
TheurgicLore [label="{ Theurgic Lore |{
5) append everything between the 1st and 2nd comma of the source file
followed by a '|'
TheurgicLore [label="{ Theurgic Lore |{Lore|
6) append everything between the 2nd and 3rd comma of the source file
followed by a '}|{'
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{
7) append everything between the 3rd and 4th comma of the source file
followed by a '|'
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|
8) append everything between the 4th and 5th comma of the source file
followed by a '|'
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|
9) append everything between the 5th and 6th comma of the source file
followed by a '}}"];'
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
Those 9 steps spit out my fist line of output file as above
"TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];" I
now have to parse the dependancies onto a newline.
# this next process needs to be repeated for each prerequisite, so if
there are two pre-requisites it would need to keep parsing for more
comma's.
1a) take everything between the 6th and 7th comma and put it at the
start of a new line (remove spaces)
DistilMana
2a) append '-> '
DistilMana ->
3a) append everything prior to the first comma, with spaces removed
DistilMana -> TheurgicLore
This should now be all the steps to spit out:
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2011-06-22 14:58 +0000 |
| Message-ID | <96ee9bFlrrU1@mid.individual.net> |
| In reply to | #8213 |
On 2011-06-22, Andy Barnes <andy.barnes@gmail.com> wrote:
> to expand. I have parsed one of the lines manually to try and break
> the process I'm trying to automate down.
>
> source:
> Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
>
> output:
> TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
> DistilMana -> TheurgicLore;
>
> This is the steps I would take to do this conversion manually:
It seems to me that parsing the file into an intermediate model
and then using that model to serialize your output would be
easier to understand and more robust than modifying the csv
entries in place. It decouples deciphering the meaning of the
data from emitting the data, which is more robust and expansable.
The amount of ingenuity required is less, though. ;)
--
Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2011-06-22 17:05 +0200 |
| Message-ID | <mailman.281.1308755152.1164.python-list@python.org> |
| In reply to | #8210 |
Andy Barnes wrote:
> Hi,
>
> I am hoping someone here can help me with a problem I'd like to
> resolve with Python. I have used it before for some other projects but
> have never needed to use Regular Expressions before. It's quite
> possible I am following completley the wrong tack for this task (so
> any advice appreciated).
>
> I have a source file in csv format that follows certain rules. I
> basically want to parse the source file and spit out a second file
> built from some rules and the content of the first file.
>
> Source File Format:
>
> Name, Type, Create, Study, Read, Teach, Prerequisite
> # column headers
>
> Distil Mana, Lore, n/a, 70, 38, 21
> Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
> Talismantic Lore, Lore, n/a, 150, 100, 50
> Advanced Talismantic Lore, Lore, n/a, 100, 60, 30, Talismantic Lore,
> Theurgic Lore
>
> The input file I have has over 700 unique entries. I have tried to
> cover the four main exceptions above. Before I detail them - this is
> what I would like the above input file, to be output as (dot
> diagramming language incase anyone recognises it):
>
> Name, Type, Create, Study, Read, Teach, Prerequisite
> # column headers
>
> DistilMana [label="{ Distil Mana |{Type|7}|{70|38|21}}"];
> TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
> DistilMana -> TheurgicLore;
> TalismanticLore [label="{ Talismantic Lore |{Lore|n/a}|{150|100|
> 50}}"];
> AdvanvedTalismanticLore [label="{ Advanced Talismantic Lore |{Lore|n/
> a}|{100|60|30}}"];
> TalismanticLore -> AdvanvedTalismanticLore;
> TheurgicLore -> AdvanvedTalismanticLore;
>
> It's quite a complicated find and replace operation that can be broken
> down into some easy stages. The main thing the sample above showed was
> that some of the entries won't list any prerequisits - these only need
> the descriptor entry creating. Some of them have more than one
> prerequisite. A line is needed for each prerequisite listed, linking
> it to it's parent.
>
> You can also see that the 'name' needs to have spaces removed and it's
> repeated a few times in the process. I Hope it's easy to see what I am
> trying to achieve from the above. I'd be very happy to accept
> assistance in automating the conversion of my ever expanding csv file,
> into the dot format described above.
Forget about regexes. If there's any complexity it's in writing the output
rather than reading the input file. You can tackle that by putting your data
into a dictionary and using a format string:
import sys
def camelized(s):
return "".join(s.split())
template = """%(camel)s [label="{ %(name)s |{%(type)s|%(create)s}|
{%(study)s|%(read)s|%(teach)s}}"];"""
def process(instream, outstream):
instream = (line for line in instream if not (line.isspace() or
line.startswith("#")))
rows = (map(str.strip, line.split(",")) for line in instream)
headers = map(str.lower, next(rows))
for row in rows:
rowdict = dict(zip(headers, row))
camel = rowdict["camel"] = camelized(rowdict["name"])
print template % rowdict
for for_lack_of_better_name in row[len(headers)-1:]:
print "%s -> %s;" % (camelized(for_lack_of_better_name), camel)
if __name__ == "__main__":
from StringIO import StringIO
instream = StringIO("""\
Name, Type, Create, Study, Read, Teach, Prerequisite
# column headers
Distil Mana, Lore, n/a, 70, 38, 21
Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
Talismantic Lore, Lore, n/a, 150, 100, 50
Advanced Talismantic Lore, Lore, n/a, 100, 60, 30, Talismantic Lore,
Theurgic Lore
""")
process(instream, sys.stdout)
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2011-06-22 21:50 -0700 |
| Message-ID | <mailman.312.1308804907.1164.python-list@python.org> |
| In reply to | #8210 |
On Wed, 22 Jun 2011 07:00:42 -0700 (PDT), Andy Barnes
<andy.barnes@gmail.com> declaimed the following in
gmane.comp.python.general:
>
> I have a source file in csv format that follows certain rules. I
So why not use the CSV module to read&split the fields...
> It's quite a complicated find and replace operation that can be broken
> down into some easy stages. The main thing the sample above showed was
> that some of the entries won't list any prerequisits - these only need
> the descriptor entry creating. Some of them have more than one
> prerequisite. A line is needed for each prerequisite listed, linking
> it to it's parent.
>
That's output formatting, I won't bother trying to create an
algorithm for that...
> You can also see that the 'name' needs to have spaces removed and it's
> repeated a few times in the process. I Hope it's easy to see what I am
strippedName = "".join(originalName.split())
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web