Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #56688 > unrolled thread
| Started by | Starriol <juanscopp@gmail.com> |
|---|---|
| First post | 2013-10-11 08:01 -0700 |
| Last post | 2013-10-11 23:40 +0100 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Consolidate several lines of a CSV file with firewall rules Starriol <juanscopp@gmail.com> - 2013-10-11 08:01 -0700
Re: Consolidate several lines of a CSV file with firewall rules Joel Goldstick <joel.goldstick@gmail.com> - 2013-10-11 11:47 -0400
Re: Consolidate several lines of a CSV file with firewall rules Tim Chase <python.list@tim.thechases.com> - 2013-10-11 15:40 -0500
Re: Consolidate several lines of a CSV file with firewall rules [PS] Tim Chase <python.list@tim.thechases.com> - 2013-10-11 15:50 -0500
Re: Consolidate several lines of a CSV file with firewall rules [PS] Starriol <juanscopp@gmail.com> - 2013-10-11 14:22 -0700
Re: Consolidate several lines of a CSV file with firewall rules [PS] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-11 23:40 +0100
| From | Starriol <juanscopp@gmail.com> |
|---|---|
| Date | 2013-10-11 08:01 -0700 |
| Subject | Consolidate several lines of a CSV file with firewall rules |
| Message-ID | <9190d949-7b03-4579-8d46-c42afcc9d0fa@googlegroups.com> |
Hi guys. I have a CSV file, which I created using an HTML export from a Check Point firewall policy. Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or services. I need the output to have each rule described in only one line. It's easy to distinguish when each rule begins. In the first column, there's the rule ID, which is a number. Let me show you an example: NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT 1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment suppressed* ;;;;;igmp**;;;;; 2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management FWg;*Comment suppressed* ;;fwmgmpe**;fwmgmpe**;;ssh**;;;;; ;;fwmgm**;fwmgm**;;;;;;; 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment suppressed* ;;;cmm_ntpserver_pe02**;;;;;;; What I need ,explained in pseudo code, is this: Read the first column of the next line. If there's a number: Evaluate the first column of the next line. If there's no number there, concatenate (separating with a comma) \ the strings in the columns of this line with the last one and eliminate the text in the current one The output should be something like this: NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT 1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment suppressed* ;;;;;;;;;; ;;;;;;;;;; 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment suppressed* ;;;;;;;;;; The empty lines are there only to be more clear, I don't actually need them. Thanks!
[toc] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-10-11 11:47 -0400 |
| Message-ID | <mailman.1006.1381506463.18130.python-list@python.org> |
| In reply to | #56688 |
On Fri, Oct 11, 2013 at 11:01 AM, Starriol <juanscopp@gmail.com> wrote: > Hi guys. > I have a CSV file, which I created using an HTML export from a Check Point firewall policy. > Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or services. > I need the output to have each rule described in only one line. > It's easy to distinguish when each rule begins. In the first column, there's the rule ID, which is a number. > > Let me show you an example: > > NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT > 1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment suppressed* > ;;;;;igmp**;;;;; > 2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management FWg;*Comment suppressed* > ;;fwmgmpe**;fwmgmpe**;;ssh**;;;;; > ;;fwmgm**;fwmgm**;;;;;;; > 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment suppressed* > ;;;cmm_ntpserver_pe02**;;;;;;; > > What I need ,explained in pseudo code, is this: > > Read the first column of the next line. If there's a number: > Evaluate the first column of the next line. If there's no number there, concatenate (separating with a comma) \ > the strings in the columns of this line with the last one and eliminate the text in the current one > > The output should be something like this: > > NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT > 1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment suppressed* > ;;;;;;;;;; > ;;;;;;;;;; > 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment suppressed* > ;;;;;;;;;; > > The empty lines are there only to be more clear, I don't actually need them. > > Thanks! > -- > https://mail.python.org/mailman/listinfo/python-list I think you posted twice, and perhaps in html? Its hard to read. At any rate, there is a csv module in python that will let you gather your data in a list of lists. With that you can iterate through the csv rows, saving rows with a number in the first position. Iterate and append the rows below that until you run into another row with a number in the first position. Why don't you write some code, see how it goes, copy and paste the code back here with full traceback if you get an error or with your results if you have some. Do it for a subset of a couple of rows of input data. -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2013-10-11 15:40 -0500 |
| Message-ID | <mailman.1016.1381523955.18130.python-list@python.org> |
| In reply to | #56688 |
On 2013-10-11 08:01, Starriol wrote:
> NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL
> ON;TIME;COMMENT
> 1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment
> suppressed* ;;;;;igmp**;;;;;
> 2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management
> FWg;*Comment
> suppressed* ;;fwmgmpe**;fwmgmpe**;;ssh**;;;;; ;;fwmgm**;fwmgm**;;;;;;;
> 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment
> suppressed* ;;;cmm_ntpserver_pe02**;;;;;;;
> What I need ,explained in pseudo code, is this:
>
> Read the first column of the next line. If there's a number:
> Evaluate the first column of the next line. If there's no
> number there, concatenate (separating with a comma) \ the strings
> in the columns of this line with the last one and eliminate the
> text in the current one
>
> The output should be something like this:
>
> NO.;NAME;SOURCE;DESTINATION;VPN ;SERVICE;ACTION;TRACK;INSTALL
> ON;TIME;COMMENT
> 1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment
> suppressed* ;;;;;;;;;; ;;;;;;;;;;
> 3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment
> suppressed* ;;;;;;;;;;
>
> The empty lines are there only to be more clear, I don't actually
> need them.
Though there are a couple oddities in your source data, I'll take a
crack at it. First, there are the dangling open-quotes on #1 that
cause most CSV parsers (I tested both Gnumeric and Python's csv
module) to read until the subsequent line is read. If this is
intentional, then it's all good. If not, you get weird behaviors.
That said, you can try the code below (adjusting the 3 lines for
your desired filenames and whether you *want* to write emptied rows)
to see if it gets you what you want:
##########################################################
import csv
# adjust these 3 lines
WRITE_EMPTIES = True
INFILE = "input.txt"
OUTFILE = "output.txt"
with open(INFILE, "r") as in_file:
r = csv.reader(in_file, delimiter=";")
with open(OUTFILE, "wb") as out_file:
previous = None
empties_to_write = 0
out_writer = csv.writer(out_file, delimiter=";")
for i, row in enumerate(r):
first_val = row[0].strip()
if first_val:
if previous:
out_writer.writerow(previous)
if WRITE_EMPTIES and empties_to_write:
out_writer.writerows(
[["" for _ in previous]] * empties_to_write
)
empties_to_write = 0
previous = row
else: # append sub-portions to each other
previous = [
",".join(
subitem
for subitem in existing.split(",") + [new]
if subitem
)
for existing, new in zip(previous, row)
]
empties_to_write += 1
if previous: # take care of the last row
out_writer.writerow(previous)
if WRITE_EMPTIES and empties_to_write:
out_writer.writerows(
[["" for _ in previous]] * empties_to_write
)
####################################################
Hope this helps,
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2013-10-11 15:50 -0500 |
| Subject | Re: Consolidate several lines of a CSV file with firewall rules [PS] |
| Message-ID | <mailman.1017.1381524510.18130.python-list@python.org> |
| In reply to | #56688 |
On 2013-10-11 15:40, Tim Chase wrote: > the dangling open-quotes on #1 that cause most CSV parsers to read > until the subsequent line is read. And by "subsequent line", I mean "subsequent closing-quote" of course. :-) -tkc
[toc] | [prev] | [next] | [standalone]
| From | Starriol <juanscopp@gmail.com> |
|---|---|
| Date | 2013-10-11 14:22 -0700 |
| Subject | Re: Consolidate several lines of a CSV file with firewall rules [PS] |
| Message-ID | <4a2ba266-a741-48e3-bbea-043b0152a9f6@googlegroups.com> |
| In reply to | #56709 |
On Friday, October 11, 2013 5:50:06 PM UTC-3, Tim Chase wrote: > On 2013-10-11 15:40, Tim Chase wrote: > > > the dangling open-quotes on #1 that cause most CSV parsers to read > > > until the subsequent line is read. > > > > And by "subsequent line", I mean "subsequent closing-quote" of > > course. :-) > > > > -tkc Ha, thanks a million, Tim! That worked great! Hopefully soon enough I'm going to be able to write it on my own without having to bother you guys!
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-10-11 23:40 +0100 |
| Subject | Re: Consolidate several lines of a CSV file with firewall rules [PS] |
| Message-ID | <mailman.1023.1381531269.18130.python-list@python.org> |
| In reply to | #56712 |
On 11/10/2013 22:22, Starriol wrote: > On Friday, October 11, 2013 5:50:06 PM UTC-3, Tim Chase wrote: >> On 2013-10-11 15:40, Tim Chase wrote: >> >>> the dangling open-quotes on #1 that cause most CSV parsers to read >> >>> until the subsequent line is read. >> >> >> >> And by "subsequent line", I mean "subsequent closing-quote" of >> >> course. :-) >> >> >> >> -tkc > > Ha, thanks a million, Tim! > That worked great! > Hopefully soon enough I'm going to be able to write it on my own without having to bother you guys! > If in future you do have to bother us we'll not object, but you'll certainly be less bother if you read and digest this first https://wiki.python.org/moin/GoogleGroupsPython :) -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web