Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89573 > unrolled thread
| Started by | Kashif Rana <kashifrana84@gmail.com> |
|---|---|
| First post | 2015-04-29 13:42 -0700 |
| Last post | 2015-08-19 12:53 -0700 |
| Articles | 8 — 7 participants |
Back to article view | Back to comp.lang.python
Python re to extract useful information from each line Kashif Rana <kashifrana84@gmail.com> - 2015-04-29 13:42 -0700
Re: Python re to extract useful information from each line Kashif Rana <kashifrana84@gmail.com> - 2015-04-29 13:49 -0700
Re: Python re to extract useful information from each line Emile van Sebille <emile@fenx.com> - 2015-04-29 14:22 -0700
Re: Python re to extract useful information from each line MRAB <python@mrabarnett.plus.com> - 2015-04-29 22:28 +0100
Re: Python re to extract useful information from each line Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-04-29 22:30 +0100
Re: Python re to extract useful information from each line Tim Chase <python.list@tim.thechases.com> - 2015-04-29 17:38 -0500
Re: Python re to extract useful information from each line sohcahtoa82@gmail.com - 2015-04-29 16:29 -0700
Re: Python re to extract useful information from each line Paul McGuire <ptmcg@austin.rr.com> - 2015-08-19 12:53 -0700
| From | Kashif Rana <kashifrana84@gmail.com> |
|---|---|
| Date | 2015-04-29 13:42 -0700 |
| Subject | Python re to extract useful information from each line |
| Message-ID | <e5473ccc-4f7d-431d-93a7-1aeeededcbf0@googlegroups.com> |
Hello Experts I have below lines with some variations. 1- set policy id 1000 from "Untrust" to "Trust" "Any" "1.1.1.1" "HTTP" nat dst ip 10.10.10.10 port 8000 permit log 2- set policy id 5000 from "Trust" to "Untrust" "Any" "microsoft.com" "HTTP" nat src permit schedule "14August2014" log 3- set policy id 7000 from "Trust" to "Untrust" "Users" "Any" "ANY" nat src dip-id 4 permit log 4- set policy id 7000 from "Trust" to "Untrust" "servers" "Any" "ANY" deny Please help me to write the regular expression to extract below information in parenthesis, if exist from each line. Please note that some items may exist or not like nat or log set policy id (id) from (from) to (to) (source) (destination) (service) nat (src or dst) (dip-id 4) or (ip 10.10.10.10) port (dst-port) (action) schedule (schedule) (log)
[toc] | [next] | [standalone]
| From | Kashif Rana <kashifrana84@gmail.com> |
|---|---|
| Date | 2015-04-29 13:49 -0700 |
| Message-ID | <220dafbc-25f0-48a7-b37a-c8a77a6f2ffa@googlegroups.com> |
| In reply to | #89573 |
On Thursday, April 30, 2015 at 12:42:18 AM UTC+4, Kashif Rana wrote:
> Hello Experts
>
> I have below lines with some variations.
>
> 1- set policy id 1000 from "Untrust" to "Trust" "Any" "1.1.1.1" "HTTP" nat dst ip 10.10.10.10 port 8000 permit log
>
> 2- set policy id 5000 from "Trust" to "Untrust" "Any" "microsoft.com" "HTTP" nat src permit schedule "14August2014" log
>
> 3- set policy id 7000 from "Trust" to "Untrust" "Users" "Any" "ANY" nat src dip-id 4 permit log
>
> 4- set policy id 7000 from "Trust" to "Untrust" "servers" "Any" "ANY" deny
>
> Please help me to write the regular expression to extract below information in parenthesis, if exist from each line. Please note that some items may exist or not like nat or log
>
> set policy id (id) from (from) to (to) (source) (destination) (service) nat (src or dst) (dip-id 4) or (ip 10.10.10.10) port (dst-port) (action) schedule (schedule) (log)
I tried below re and its not working.
id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:\s(?P<p_nat_status>nat)\s(?P<p_nat_type>\w+)(\s?P<p_nat_src_ip>dip-id\s\d+)?(\sip\s(?P<p_nat_dst_ip>[\d\.]+)\sport(?P<dst_nat_port>\d+))?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$
If I ignore the line 1. I made below re and its working and giving me all info.
pol_elements = re.compile('id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:(?P<p_nat_status>\snat)\s(?P<p_nat_type>[^\s]+?)(?P<p_nat_ip>\sdip-id\s[^\s]+?)?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$'
)
[toc] | [prev] | [next] | [standalone]
| From | Emile van Sebille <emile@fenx.com> |
|---|---|
| Date | 2015-04-29 14:22 -0700 |
| Message-ID | <mailman.98.1430342578.3680.python-list@python.org> |
| In reply to | #89574 |
On 4/29/2015 1:49 PM, Kashif Rana wrote:
> pol_elements = re.compile('id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:(?P<p_nat_status>\snat)\s(?P<p_nat_type>[^\s]+?)(?P<p_nat_ip>\sdip-id\s[^\s]+?)?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$'
> )
... and that's why we avoid regular expressions... it makes my head hurt
just looking at that line noise.
Emile
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2015-04-29 22:28 +0100 |
| Message-ID | <mailman.99.1430342891.3680.python-list@python.org> |
| In reply to | #89574 |
On 2015-04-29 22:22, Emile van Sebille wrote:
> On 4/29/2015 1:49 PM, Kashif Rana wrote:
>> pol_elements = re.compile('id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:(?P<p_nat_status>\snat)\s(?P<p_nat_type>[^\s]+?)(?P<p_nat_ip>\sdip-id\s[^\s]+?)?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$'
>> )
>
>
> ... and that's why we avoid regular expressions... it makes my head hurt
> just looking at that line noise.
>
It might just be easier to split it into a list of fields and then pick
out the ones you want:
fields = re.findall(r'"[^"]+"|\S+', line)
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-04-29 22:30 +0100 |
| Message-ID | <mailman.100.1430343053.3680.python-list@python.org> |
| In reply to | #89574 |
On 29/04/2015 22:22, Emile van Sebille wrote:
> On 4/29/2015 1:49 PM, Kashif Rana wrote:
>> pol_elements =
>> re.compile('id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:(?P<p_nat_status>\snat)\s(?P<p_nat_type>[^\s]+?)(?P<p_nat_ip>\sdip-id\s[^\s]+?)?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$'
>>
>> )
>
>
> ... and that's why we avoid regular expressions... it makes my head hurt
> just looking at that line noise.
>
> Emile
>
Great minds think alike :)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-04-29 17:38 -0500 |
| Message-ID | <mailman.105.1430349368.3680.python-list@python.org> |
| In reply to | #89574 |
On 2015-04-29 14:22, Emile van Sebille wrote:
> On 4/29/2015 1:49 PM, Kashif Rana wrote:
> > pol_elements =
> > re.compile('id\s(?P<p_id>.+?)(?:\sname\s(?P<p_name>.+?))?\sfrom\s(?P<p_from>.+?)\sto\s(?P<p_to>.+?)\s{2}(?P<p_src>[^\s]+?)\s(?P<p_dst>[^\s]+?)\s(?P<p_port>[^\s]+?)(?:(?P<p_nat_status>\snat)\s(?P<p_nat_type>[^\s]+?)(?P<p_nat_ip>\sdip-id\s[^\s]+?)?)?\s(?P<p_action>[^\s]+?)(?:\sschedule\s(?P<p_schedule>[^\s]+?))?(?P<p_log_status>\slog)?$'
> > )
>
> ... and that's why we avoid regular expressions... it makes my head
> hurt just looking at that line noise.
First, it appears the OP isn't using raw strings which make those
back-slashes just ask for trouble.
That said, it would be a lot better if the OP made use of re.VERBOSE
to put each component on its own line:
pol_elements = re.compile(r"""
id
\s
(?P<p_id>.+?)
(?:
\s
name
\s
(?P<p_name>.+?)
)?
\s
from
\s
(?P<p_from>.+?)
\s
to
\s
(?P<p_to>.+?)
\s{2}
(?P<p_src>[^\s]+?)
\s
(?P<p_dst>[^\s]+?)
\s(?P<p_port>[^\s]+?)
(?:
\s
(?P<p_nat_status>nat)
\s
(?P<p_nat_type>\w+)
(
\s?
P<p_nat_src_ip>dip-id
\s
\d+
)?
(
\s
ip
\s
(?P<p_nat_dst_ip>[\d\.]+)
\s
port
(?P<dst_nat_port>\d+)
)?
)?
\s
(?P<p_action>[^\s]+?)
(?:
\s
schedule
\s
(?P<p_schedule>[^\s]+?)
)?
(?P<p_log_status>\slog)?
$
""", re.VERBOSE)
which, with some copious comments in the expression, would make it
almost readable.
Alternatively, switch to an actual parser like pyparsing.
-tkc
[toc] | [prev] | [next] | [standalone]
| From | sohcahtoa82@gmail.com |
|---|---|
| Date | 2015-04-29 16:29 -0700 |
| Message-ID | <fdc9aa77-9c75-4fff-8722-5c8a8057ca13@googlegroups.com> |
| In reply to | #89573 |
On Wednesday, April 29, 2015 at 1:42:18 PM UTC-7, Kashif Rana wrote: > Hello Experts > > I have below lines with some variations. > > 1- set policy id 1000 from "Untrust" to "Trust" "Any" "1.1.1.1" "HTTP" nat dst ip 10.10.10.10 port 8000 permit log > > 2- set policy id 5000 from "Trust" to "Untrust" "Any" "microsoft.com" "HTTP" nat src permit schedule "14August2014" log > > 3- set policy id 7000 from "Trust" to "Untrust" "Users" "Any" "ANY" nat src dip-id 4 permit log > > 4- set policy id 7000 from "Trust" to "Untrust" "servers" "Any" "ANY" deny > > Please help me to write the regular expression to extract below information in parenthesis, if exist from each line. Please note that some items may exist or not like nat or log > > set policy id (id) from (from) to (to) (source) (destination) (service) nat (src or dst) (dip-id 4) or (ip 10.10.10.10) port (dst-port) (action) schedule (schedule) (log) If you don't have to worry about spaces in your strings, I'd just use split(). If you DO need to worry about spaces, it'd be trivial to write your own parser that stepped through the string a single character at a time. The shlex module does this, but might not work for you. I don't know how it would handle an IP address.
[toc] | [prev] | [next] | [standalone]
| From | Paul McGuire <ptmcg@austin.rr.com> |
|---|---|
| Date | 2015-08-19 12:53 -0700 |
| Message-ID | <a459bee9-e3ed-4caf-a6dd-67823e818f3d@googlegroups.com> |
| In reply to | #89573 |
Here is a first shot at a pyparsing parser for these lines:
from pyparsing import *
SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY = map(CaselessKeyword,
"SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY".split(','))
integer = Word(nums)
ipAddr = Combine(integer + ('.'+integer)*3)
quotedString.setParseAction(removeQuotes)
logParser = (SET + POLICY + ID + integer("id") +
FROM + quotedString("from_") +
TO + quotedString("to_") + quotedString("service"))
I run this with:
for line in """
1- set policy id 1000 from "Untrust" to "Trust" "Any" "1.1.1.1" "HTTP" nat dst ip 10.10.10.10 port 8000 permit log
2- set policy id 5000 from "Trust" to "Untrust" "Any" "microsoft.com" "HTTP" nat src permit schedule "14August2014" log
3- set policy id 7000 from "Trust" to "Untrust" "Users" "Any" "ANY" nat src dip-id 4 permit log
4- set policy id 7000 from "Trust" to "Untrust" "servers" "Any" "ANY" deny
""".splitlines():
line = line.strip()
if not line: continue
print (integer + '-' + logParser).parseString(line).dump()
print
Getting:
['1', '-', 'SET', 'POLICY', 'ID', '1000', 'FROM', 'Untrust', 'TO', 'Trust', 'Any']
- from_: Untrust
- id: 1000
- service: Any
- to_: Trust
['2', '-', 'SET', 'POLICY', 'ID', '5000', 'FROM', 'Trust', 'TO', 'Untrust', 'Any']
- from_: Trust
- id: 5000
- service: Any
- to_: Untrust
['3', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 'Users']
- from_: Trust
- id: 7000
- service: Users
- to_: Untrust
['4', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 'servers']
- from_: Trust
- id: 7000
- service: servers
- to_: Untrust
Pyparsing adds Optional classes so that you can include expressions for pieces that might be missing like "... + Optional(NAT + (SRC | DST)) + ..."
-- Paul
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web